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ABSTRACT 


Five  main  classes  of  n  rooting  methods  are  dis- 

t/h 

cussed  in  t^s  report.  An  n  rooting  method  derivable  from 

the  binomial  series  expansion  is  developed,  and  both  re> 

storing  and  nonrestoring  versions  are  treated.  For  the 

special  case  of  the  binary  square  root,  a  nonrestoring 

version  of  this  method  using  normalized  remainders  is  sim- 

ulated  and  a  statistical  timing  distribution  obtained. 

other  n^  rooting  methods  discussed  are  a  truncated 

series  method,  Euler  iteration  formulae,  extensions  of  a 

square  root  method  given  by  M.  Kadler,  Fade  approximations 

and  the  log-exponential  method.  A  particular  mechanization 

of  the  log  and  exponential  functions  developed  by  Cantor, 

th 

Estrin,  and  Turn  is  compared  timewise  v/ith  the  other  n 
rooting  methods.  Hardware  and  storage  requirements  are 
considered  in  all  cases. 

It  is  concluded  that  the  log-exponential  mechaniza¬ 
tion  of  Cantor,  Estrin,  and  Turn  is  the  fastest  and  most 
versatile  except  for  very  small  values  of  n.  The  binomial 
scries  method  is  foxmd  to  be  fastest  for  the  binary  square 
root. 
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CHAPTER  I 


Introduction 

Important  element£a:y  functions  rarely  included 
in  the  basic  set  of  operations  of  most  computers  are  the 
integral  roots  of  an  operand.  In  particular,  the  square 
root  plays  an  Important  role  in  the  solution  of  quadra¬ 
tic  equations,  phasor  algebra,  asymptotic  expansions  of 
Bessel  functions,  and  a  host  of  other  applications.  Less 
frequently  required  are  the  higher  integral  roots.  This 
report  concentrates  its  attention  on  integral  n^^  roots, 
v/ith  particvdar  emphasis  on  the  square  root. 

Programmed  Methods  for  General  Purpose  Digital 
Computers 

The  most  common  methods  available  to  computer 
users  are  program  libreiry  subroutines.  The  following  ex¬ 
amples  are  IBU  oriented,  but  can  be  considered  represen¬ 
tative.  Most  of  the  coded  subroutines  available  through 
the  SHARE  organization  are  for  the  square  root  only,  and 
apply  to  floating-point  operands.  One  of  the  fastest  is 
SHARE  distribution  no.  721,  which  uses  a  least-squares 
approximation  follov/ed  by  two  Nev/ton— Raphson  iterations, 

-ft 

with  a  maximum  relative  error  of  2.3  X  10  .  The  routine 


1 


requires  30  v/ords  of  storage,  and  through  clever  coding 
executes  a  single-precision  square  root  in  67  IBU  7090 
machine  cycles  (1  cycle  ^  2.18  microseconds). 

In  contrast  to  the  intricately  coded  case  above, 
there  is  an  n^^  root  subroutine  (n  integral)  available 
(SHARE  distribution  no.  690)  which  builds  up  the  root 
digit  by  digit  in  a  trial-and-error  fashion,  checldLng 
each  bir*ary  digit  by  raising  the  trial  root  to  the  n 
power,  thus  using  a  great  many  multiplications. 

Lastly,  it  is  interesting  to  note  how  the  IBU 
POETRAK  II  compiler  sets  up  the  exponentiation  operation 
X**p,  If  p  l3  an  integer  less  than  8,  the  operation  is 
executed  as  a  series  of  P-1  multiplications.  If  P  is  an 
integer  greater  than  or  equal  to  8,  a  log-exponential 
sequence  is  used.  Also,  if  P  is  not  an  integer  (as  in  the 
case  of  n  "  roots),  the  log-exponential  sequence  is  used. 
If  the  PORTPJlN  programmer  desires  the  square  root  he  may 
use  the  special  routine  (SQRT)  provided,  which  uses  the 
least-squares-two  Newton-Haphson  iteration  sequence. 
Objective  and  Scope 

In  this  report  five  main  classes  of  n  rooting 
methods  are  discussed  from  the  standpoint  of  timing  and 
mechanization. 
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The  first  method,  called  the  binonlal  theorem 
method,  Is  in  the  same  class  as  ordinary  long  division 
euid  is  shovm  to  he  a  higher-order  exterjsion  of  the  divis¬ 
ion  process.  Its  formulation  relies  heavily  upon  the  ved- 
ues  of  the  binomial  coefficients  for  different  values  of 
n.  Both  restoring  and  nonrestoring  methods  are  discussed, 
and  a  nonrestoring  method  using  normalized  remainders 
whose  speed  depends  upon  the  statistical  distribution  of 
the  various  remainders  during  the  rooting  process  is  out¬ 
lined.  The  simplest  case,  the  square  root,  has  been  sim¬ 
ulated  and  the  resulting  distribution  of  execution  times 
obtained.  Inherent  difficulties  in  the  binomial  theorem 
method  for  higher  roots  are  pointed  out. 

A  second  n^^  rooting  process  considered  is  one 
that  relies  upon  the  operand  being  in  a  favorable  Interval 
such  that  its.n^”  root  can  be  expressed  as  a  correctable 
truncated  series  having  very  few  terns.  The  operand  is 
forced  into  this  favorable  interval  by  using  stored  const¬ 
ant  multipliers  obtained  by  table  loohups.  The  nature  of 
these  constants  as  well  as  stored  constants  to  correct 
the  result  obtained  from  the  truncated  series  are  pre¬ 
sented,  Guid  table  sizes  are  given  as  a  function  of  speed 
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and  accuracy.  A  related  method  which  forces  the  operand 
into  a  given  interval  near  unity  while  another  transfor¬ 
mation  dependently  forms  the  n^^  root  is  discussed* 

Another  class  of  n^  rooting  procedures  covered 
are  those  derivable  from  Euler's  formula.  A  derivation  of 
order  n  rooting  processes  obtainable  from  Eiiler's 
formula  as  developed  by  J.  F*  Traub  in  a  recent  article 
is  presented  and  their  timing  and  mechanization  are  dis¬ 
cussed. 

A  fourth  method  considered  is  the  approximation  of 
the  n*^  root  by  a  rational  fraction  which  is  the  ratio  of 
tv/o  polynomials  involving  the  operand.  This  type  of  ap¬ 
proximation  is  called  the  Fade  approximation,  after  the 
mathematician  who  formulated  it.  A  special  case,  the  Fade 
approximation  of  order  one,  is  analyzed  in  some  detail 
with  respect  to  its  precision  for  different  values  of  n. 

Lastly,  the  familiar  logarithm-antilogarithm  meth¬ 
od  of  extracting  n^^  roots  v/ill  be  treated,  using  as  an 
example  a  configuration  developed  by  Cantor,  Estrin,  and 

X 

Turn  which  generates  the  elementary  functions  In  x  and  e 
for  any  given  x. 

For  clearly  competitive  methods,  comparisons  are 

th 

made  with  the  log-exponential  approach  to  the  n 
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rootlzig  problem,  and  the  points  at  which  nechanlzatlon  of 
the  methods  in  question  becone  as  time  consuming  as  the 
log-exponential  method  are  estimated.  In  all  cases  para¬ 
meters  such  as  hardv^are  or  storage  requirements  are  de¬ 
fined  along  with  the  potential  parallelism  inherent  in 
the  procedmre. 
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CHAPTER  II 


Application  oi  the  Binomial  Theorem  to  the  Extraction 
of  Roots  of  Integral  Order 

A  given  positive  real  integer  of  rik  digits  nay  be 
represented  in  the  visual  positional  notation  as 

*  =  ♦  I>o  > 

Where  =  i*^  digit,  0  <  <  B,  and 

B  base  of  the  number  system  used. 

Both  n  and  k  are  positive  Integers,  and  thus  A  consists 
of  an  integral  multiple  of  n  digits.  In  addition,  let  it 
be  req.uired  that 

♦V 

S“nk-3  >  0  ■ 

J*' 

i.e.,  at  least  one  of  the  n  most  significant  digits  of  A 
is  nonzero.  Similarly,  let  another  positive  real  integer 
of  k  digits  and  with  the  same  base  as  A  be  given  in  posi¬ 
tional  notation  as 

a  *  +  ••••  +  dj^B  +  Iq  ,  (3) 

where  d^  «  i*^  digit,  0  <  d^  <  B. 

Let  the  two  Integers  A  and  a  be  related  by  the  reciprocal 
relations 
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where 


a  =  Int,{ocJ’  and  (4) 

A  »  oc“  ,  (5) 

oc  =  ,  (6) 

and  the  operation  Int.{  }  means  the  integer  part  of  the 
expression  in  brackets.  It  is  generally  true  that  the  pos- 
itive  real  n*^  root  of  a  positive  Integer  is  not  expres¬ 
sible  exactly  as  another  positive  integer,  and  we  sheQ.1 
regard  a  as  the  integer  part  of  Oi  ,  the  exact  positive 
real  n  root  of  A.  The  problem  is,  then,  to  determine  the 
digits  of  the  integer  part  of  the  positive  real  n^ 
root  of  A  leaving  been  given  the  digits  of  A  itself. 

Por  convenience  in  notation,  let  us  introduce  the 
substitution 

Xi  =  (7) 

into  (3)  in  order  that  the  expression  for  a  assume  a  more 
convenient  multinomial  form.  Doing  this, 

®  +  Vl  ^ 

Now  approximate  ot  by  its  integer  part,  and  substitute  (8) 
into  (5)  yielding 

A  =  (xj^  +  x,,_j^  +  +  x^)*^  .  (9) 

Let  us  now  attack  the  problem  in  reverse  fashion  by  focus¬ 
ing  attention  on  the  digits  of  a.  As  a  first  approximation 
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let  *  Xj^i  i.e,,  let  a  te  approximated  by  its  highest 
order  component^.  In  a  like  manner,  then,  a  first  approx¬ 
imation  to  A  is  defined  as  succeed¬ 

ing  better  approximations  to  a  be  defined  as 

^  «  1,2,3,...  ,  (10) 

where  aQ  *  0.  Equation  (10)  clearly  shows  that  a  is  being 
built  up  digit  by  digit  toward  the  desired  value,  Int.{*}. 
The  3^^  approximation  to  A  is 


(11) 

?rom  equation  (10)  it  is  clear  that 

“  ®d-i  ^k-d+i  » 

(12) 

and  thus 

*  ^®d-i  *k-d+i^*^  • 

(13) 

Expanding  (13) 

using  the  binomial  theorem. 

=  ®d-l 
0^  “  ^d-1 

*  t*^3-i^k-d+i  *3c-d+il 

• 

(14) 

By  definition,  Aq  «  0. 

Equations  (14)  and  (12)  represent  an  iterative  se- 

,  th 

quence  that  nay  be  used  to  extract  the  positive  real  n 
root  of  a  given  positive  real  integer.  Since  the  integer 
part  of  the  desired  root  is  built  up  digit  by  digit,  the 


By  a  component  is  meant  the  digit  times 
the  power  of  B. 
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sequence  of  approximations  obeys  ^  therefore 
the  approximations  a^  approach  a  monotonlcally  from  below. 
Equation  (10)  ensures  that  *  a,  and  that 
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*  ec  , 


Thus,  Ot  -  a^  ^  C.  »  €>0,  i.e,,  the  error  Oc  -  a^  nay  be 

nade  as  small  as  desired  by  merely  executing  more  stages 
of  the  iterative  process  (14).  V/e  may,  then,  extract  the 
n*^  root  of  A  beyond  its  integer  part  to  as  many  places  as 
desired. 


Specialization  to  a  Restoring  [lOj  Type  Procediire  for 


Obtaining  the  Square  Root  of  a  Re 


Let  xis  rev/rite  (14)  by  considering  the  remainder  at 
each  stage  of  the  iterative  process.  Let 
malce  this  substitution  in  equation  (14),  giving 

+  ••••  *  ’  ‘^5) 

Where  R-  =  A.  R.  is  the  remainder  that  results  from  the 

U  J 

stage  of  the  process.  The  3^^  remainder  is  obtained  by 
subtracting  the  terms  in  brackets  from  the  previous  re- 
malnderi  thus  obtaining  a  root  digit  in  the  process*  Be¬ 
cause  the  components  arc  postulated  to  be  the  actual 

th 

components  of  the  Integer  part  of  the  exact  n  root,  it 
is  clear  that  0  ^  ^  R^_j^  • 
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Relation  to  PivlBlon 

It  is  Instructive  to  point  out  the  sinilarlty  be¬ 
tween  the  rooting  process  outlined  in  (13)  and  the  restor¬ 
ing  type  division  process.  Usin^  the  notation  of  (8),  we 
may  write  out  the  division  problem  U/V  «  W,  where  U  is  the 
divldendi  V  the  divisor,  and  W  the  quotient. 

(“p  +  Vi  ■*  •••  *  *  Vi  *  •••  ♦ 

‘"m  *  ’p-4-l  *  •••  *  ’l> 


Where  p  and  q  are  positive  integers,  p  >  q.  In  a  manner 
similar  to  that  of  the  rooting  process,  the  quotient  W  may 
be  built  up  digit  by  digit  in  the  follov/lng  manner: 

'"o'O-  (17  > 

Paralleling  the  rooting  process,  the  approximation  to 
U  ■  W  may  be  voritten  Uj  »  VW^,  Therefore, 

Introducing  the  remainder  \  ^  ,  the  division  pro¬ 

cess  (18)  becomes 


which  displays  its  obvious  similarity  to  the  rooting  pro¬ 
cess  in  (15).  In  fact,  if  n  »  1  in  (15),  the  rooting  pro¬ 
cess  reduces  to  the  trivial  division  problem  A/l  if  the 
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process  is  carried  out  an  infinite  number  of  stages.  It  should  be 


noted  that  the  trial  subtrahend  Vw 

p-q-j+1 


in  the  division  process 


(19)  is  functionally  independent  of  the  partial  quotient  W. 
v/hereas  the  trial  factor  '*'* 

the  rooting  process  (15)  is  functionally  dependent  on  the 
partial  root  This  dependence  is  linear  in  the  case 

of  the  square  root  (n  =  2),  quadratic  in  the  case  of  the 
cube  root  (n  =  3),  and  so  on.  This  functional  dependence  is 
important  in  the  nonrestoring  rooting  process  discussed 
later. 


In  order  to  mechr  '-’ze  the  rooting  process  In  (15) 
on  electronic  digital  computing  machinery,  a  simple  sys¬ 
tematic  method  for  generating  the  trial  factors  is  desir¬ 
ed.  Let  us  write  (15)  in  the  form  ■=  -  E^(d)  t 

v;here  Ej(d)  **  *  ^i'-j+l  *  argument 

d  of  Ej(d)  is  the  digit  pecrt  of  which  is  to  be  de¬ 

termined  during  the  stage  of  the  process.  Clearly 
*0)  »  0,  so  we  need  to  know  the  B-1  trial  factors  Ej(l), 
E^(2),...,  Ej(B-l).  In  the  restoring  method  the  trial  fac¬ 
tors  are  generally  subtracted  from  the  remainder  in  a 
"differential"  fashion,  i.e.,  *" 
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t 


etc.,  until  a  negative  remainder  is 


-{eJ(2)  -  eJ(1)} 

sensed,  at  which  tine  the  process  "regresses"  one  step  hy 
adding  on  the  previously  subtracted  item.  This  approach 
obviously  acconplishes  the  desired  result,  i.e,,  the 
snallest  Rj  -  E^(d)  >  0  is  computed,  yieldii^g  the  desir¬ 
ed  root  digit  d.  If  at  any  stage  of  the  process  the  resul¬ 
ting  remainder  is  zero,  the  process  terminates  because 
an  exact  root  has  been  found.  The  maximum  length  of  a  de¬ 
termines  the  maximma  number  of  stages  of  the  rooting  pro¬ 
cess,  sj.nce  one  root  digit  is  obtained  per  stage.  If  the 
"differential"  subtracting  method  is  used,  it  is  expected 
that  on  the  average  about  |■(B-l)  +  1  subtractions  plus  one 
readdition  must  be  performed  per  stage  of  the  process.  If 
the  binary  number  system  is  used,  the  unknov/n  root  compon- 
ent  Xj,  to  be  determined  on  the  3^  stage  nay  be  assum¬ 
ed  to  have  a  digit  part  of  "1",  the  trial  factor  Ej(l) 
formed  and  compared  with  and  the  appropriate  action 

ta]:en. 
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(21) 


Ts  «  n  j.  ^nk-njl 

Because  of  the  restrlctioix  placed  upon  A  in  (2),  i.e., 
that  at  least  one  of  the  n  highest  order  digits  of  A  be 
nonzero y  the  highest  order  root  digit  must  be  nonzero. 
That  is,  a^  =  2^”^,  Since  i  Sg  <•••*<  a^  ® ^  £  •••* 

^  eij^,  then 


2^"^  <  a^  <  2^ 


(22) 


Let  us  now  examine  the  mechanization  required  to 
execute  each  iterated  stage  of  the  process,  i.e.,  genera¬ 
tion  of  the  trial  factor  for  particular  values  of  n,  and 
subtraction  from  the  remainder  ‘tJi®  case  of  the 

square  root  (ns  2), 

“3- 

Using  (22), 

2^^-i  4  .  (24) 

Equation  (24)  shows  that  the  highest  order  digit  of  the 
trial  factor  will  always  appear  in  bit  position  2k-j  at 
the  beginning  of  the  stage  of  the  process,  which  means 
that  it  moves  one  position  right  during  execution  of  each 
stage  of  the  iterative  process.  By  noting  that  2k-2j  < 
2k-3,  Js  1,2,3,...,  a  "1"  need  only  be  inserted  (not  add¬ 
ed)  into  bit  position  2k-2j  to  account  for  the  rest  of  the 
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trial  factor I  since  a  carry  cannot  occur  because  of  (24)* 

It  is  clear  that  the  rcjialnder  Is  decreasing  In  magni¬ 
tude  with  each  succeeding  stage  of  the  process.  To  econo¬ 
mize  on  register  requirements,  let  us  shift  the  remainder 
left  one  bit  position  after  the  execution  of  each  stage. 
This  means  that  after  J  stages  the  remainder  will  be  mult¬ 
iplied  by  2^.  Inserting  this  in  (23), 

2^Rj  s  2^Rj  +  2^^"^  j  ,  (25) 

and  thus  the  leading  bit  of  the  triaJ.  factor  remains  sta¬ 
tionary  throughout  the  entire  square  rooting  process.  A 
similar  procedure  can  be  applied  to  the  expressions  invol¬ 
ved  in  the  higher  rooting  processes.  In  the  usual  sin^e 
precision  case,  a  k-bit  root  is  extracted  from  a  k-blt  op¬ 
erand,  where  k  is  the  number  of  bits  in  a  sin^e  precision 
word.  If  this  is  the  case,  the  registers  have  the  formats 
sho\m  below:  ^ 

I  Partial  Root 


s 


□ 

[ 

REMAINDER 

8 

1 

c 

TRI.\L  FACTOR 

K  Int^ri/2^—f« -  (n-l)k 


Figure  2-1;  Register  Formats  for  the  Fixed- 
Point  Binomial  Theorem  Root. 
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The  remainder  register  is  (n-l)lc  +  Int.{n/2}  bits,  and 
the  trial  factor  register  is  one  bit  less,  or  (n-l)k  ♦ 
Int.{n/2}  -  1  bits,  both  registers  having  an  additional 
sign  bit.  The  partial  root  register  must  have  attached  to 
it  some  provision  for  building  up  the  root  bit  by  bit 
starting  at  the  high  order  end.  A  coimter  with  k  sequent¬ 
ial  states  and  decoding  circuits  which  select  one  input 
line  at  each  stage  of  the  process  could  enable  this  opera¬ 
tion. 

As  n  gets  larger,  the  mechanization  complexity  inr- 
creases.  The  additional  terms  acquired  in  the  trial  factor 
might  be  formed  simultaneously  in  other  registers  or  se¬ 
quentially  formed  and  added.  For  the  case  of  extreme  par- 
"allelism  the  extraction  of  the  n^^  root  could  utilize  n-2 
multipliers,  n-2  shifters,  and  one  adder  in  addition  to 
the  registers  already  mentioned. 

Kormallzed  Remainders 

Recalling  for  the  moment  the  square  root  algorithm 
in  (25),  we  see  that  the  trial  factor  is  at  least  as  large 
as  2  .  It  is  then  clear  that  if  the  previous  remainder 

<  2^^“^,  i.e.,  it  has  "leading"  zeros,  may  be 

shifted  left  until  a  "1"  appears  in  bit  position  21:-1.  As 
a  result,  additional  zero  bits  are  introduced  into  the 
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partial  root,  one  for  each  position  the  renainder  is  shif¬ 
ted  left.  The  advantage  of  this  procedure  is  that  addition¬ 
al  digits  of  the  root  are  generated  using  simple  shifts, 
without  having  to  resort  to  time  consuming  comparisons. 

The  number  of  normalizing  shifts  made  at  any  given  point 
in  the  iterative  process  depends  upon  the  statistical 
distribution  of  the  remainder  magnitude  throughout  the 
rooting  process.  Following  C,  V.  Preiman  [4]  ,  let  ur  es¬ 
tablish  a  "figiare  of  merit"  for  the  restoring  algorithm 
with  normalized  remainders  by  defining  an  iteration  as  a 
comparison  and  conditional  subtraction,  a  normalization, 
formation  of  a  new  trial  factor,  and  condltior.al  altera¬ 
tion  of  the  partial  root.  Thus  it  is  seen  that  an  itera¬ 
tion  may  consist  of  more  them  one  stage  of  the  rooting 
process.  The  figure  of  merit  is  the  number  of  root  bits 
formed  during  each  iteration.  Similar  remeilnder  normaliza¬ 
tion  procedures  may  be  defined  for  the  higher  order  root¬ 
ing  processes. 

Konrestoring  [lo]  Algorithm  for  n^^  Rooting 

The  binary  rooting  methods  previously  discussed 
were  of  the  restoring  type.  As  is  done  in  division,  a  noo- 
restoring  modification  of  the  restoring  procedure  nay  be 
employed  to  extract  the  n  root  of  a  binary  integer. 
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Juppose,  on  each  stage  of  the  process,  the  digit 
part  of  the  desired  root  conponent  is  assuned  to  be 

a  "1"  as  was  done  in  the  restoilng  procedure.  Let  the 
trial  factor  be  formed  as  usual,  but  now  let  negative  re¬ 
mainders  be  allowed.  Let  us  now  proceed  in  such  a  way  as 
to  decrease  the  magnitude  of  the  remainder,  i.e.,  when 

>  0  subtract  the  trial  factor  from  it;  when  ^  <  0 
add  the  trial  factor  to  the  remainder.  Provided  the  root 
digits  are  formed  correctly,  using  the  nonrestoring  scheme 
ought  to  offer  a  time  advantage  over  the  restoring  method, 
because  addition  or  subtraction  of  the  trial  factor  takes 
place  without  regard  to  the  relative  magnitudes  of  the  re¬ 
mainder  and  trial  factor  (assuming  all  normalizing  shifts 
have  talcen  place),  but  only  with  regard  to  the  sign  of  the 
remainder  Rj_^  . 

Honrestoring  n^  P.ooting  Method  V/ith  Normalized  Remainders 
As  was  the  case  in  the  restoring  n^^  rooting  algor¬ 
ithm,  the  trial  factor  has  a  fixed  minimum  magnitude. 

Thus,  by  noting  the  magnitude  of  normalizing  shifts 

can  be  made  to  introduce  additional  digits  into  the  par¬ 
tial  root  v/ithout  the  necessity  of  addition  or  subtrac¬ 
tion.  The  process  is  unconplicated  if  we  consider  a  signed 
magnitude  niomber  representation. 
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Suppose  v;e  are  in  the  j  sta^e  of  the  rooting  pro¬ 
cess,  the  remainder  is  positive  and  normalized,  and  the 
trial  factor  has  been  formed.  The  difference  is  then  form¬ 
ed,  and  let  vis  suppose  tliat  this  resulting  difference  is 
negative.  Intuitively,  by  a  comparison  to  the  restoring 
method  we  know  that  the  digit  part  of  ^vaa  been 

found  to  be  zero,  so  let  the  partial  root  be  augmented 
with  this  zero  bit.  Now  the  new  (negative)  remainder,  ad¬ 
justed  left  one  bit  position  to  accovmt  for  the  factor  2^, 
may  or  may  not  have  leading  zeros  vdth  respect  to  the  fix¬ 
ed  minimum  magnitude  of  the  next  trial  factor.  If  the  re¬ 
mainder  does  not  have  einy  leading  zeros,  the  new  trial 
factor  is  formed  and  added  to  the  (negative)  remainder.  If 
the  new  remainder  has  leading  zeros,  certain  difficulties 
arise.  The  nonrestorixig  division  process  parallels  its  re¬ 
storing  counterpart  in  that  the  remainders,  except  for 
position  relative  to  an  arbitrary  fixed  reference,  are  the 
same  at  those  points  where  the  remainder  changes  sign  from 
negative  to  positive  in  the  nonrestoring  process.  However, 
the  trial  factors  in  the  rooting  processes  are  function¬ 
ally  dependent  upon  the  partial  root,  and  therefore  the 
remainders  in  the  restoring  and  nonrestorlng  algorithms 
will  not  correspond  unless  some  sort  of  correotlon  is 
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made.  Such  correspondence  to  the  restoring  algorithm  Is 
sufficient  to  guarantee  that  the  correct  n^^  root  Is  ex¬ 
tracted.  Thus,  when  the  trial  factor  Is  added  to  a  nega¬ 
tive  remainder,  a  correction  Is  also  added.  The  negative 
remainder's  leading  zeros  are  shifted  out  In  a  manner  sim¬ 
ilar  to  that  when  the  remainder  is  positive,  except  that 
in  order  to  ensure  that  the  remainder  changes  sign  from 
negative  to  positive,  It  is  shifted  left  until  a  "1"  ap¬ 
pears  In  the  bit  position  directly  to  the  right  of  the 
highest  order  bit  position  of  the  trial  factor.  However, 
when  the  remainder  is  negative,  I's  are  introduced  into 
the  partial  root  for  every  bit  position  that  the  remainder 
is  shifted  left.  Again,  it  is  seen  that  this  corresponds 
exactly  to  what  would  occur  given  the  same  remainders  at 
the  beginning  of  the  stages  involved  in  the  remGdnder's 
changes  of  sign  and  normalization* 

To  illustrate  the  mechanics  of  this  process,  an  ex¬ 
ample  of  the  restoring  and  nonrestoring  methods  applied  to 
a  binary  square  root  is  given  in  Figure  2-2.  Assume  we  are 
in  the  interior  of  a  square  rooting  process,  and  the  re¬ 
mainder  is  0.101011101,  the  trial  factor  is  0.1011101, 
and  the  partial  root  is  0.10111  .  The  symbols  are  R  s 
remainder,  TP  «  trial  factor,  and  C  =  correction. 
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Registers 

Partial  Root 

R 

♦0.101011101 

0.10111 

T? 

-0.1011101 

R 

♦0.101011101 

0.101110 

TP 

-0.010111001 

R 

♦0.0101001 00 

0.1011101 

TP 

-0.00101110101 

R 

♦0.00100011011 

0.10111011 

TP 

-0.0001011101101 

R 

♦0.0000101111111 

0.101110111 

NOHRESTORING 

Registers 

Partial  Root 

R 

♦o.ioiomoi 

0.10111 

TP 

-0.1011101 

2R 

-0.00010111 

0.101110 

TP 

-0. Shift 

8R 

-0.010111 

0.10111011 

TP 

♦O.lOlUOllOl 

16R 

♦0.101111101 

C 

♦0.000000010 

16R 

♦o.ioiium 

0.101110111 

Figure  2-2: 

Correspondence  Between  Restoring  and 
Nonrestoring  Square  Root  Processes. 

Corrections  to  Heoainders  in  the  Binary  Nonrestoring 
Rooting  Process  ' 

It  is  expected  that  the  correction  that  must  be 
made  to  some  of  the  remainders  during  the  nonrestoring 
rooting  process  will  depend  upon  both  the  partial  root 
and  the  number  of  shifts  reqxiired  to  normalize  the  re¬ 
mainder.  To  determine  the  value  of  the  correctioni  the  re 
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storing  and  nonrestoring  versions  of  a  given  Iteration 
vdll  be  coapared,  and  the  difference  in  the  final  renaln- 
ders  will  be  the  desired  correction.  Let  us  therefore  con¬ 
sider  a  group  of  stages  of  the  nonrestoring  process  which 
consists  of  one  subtraction  to  get  a  negative  remainder,  a 
normalizing  shift  of  s  bit  positions,  and  one  addition 
that  again  yields  a  positive  remainder,  and  compare  those 
factors  which  are  subtracted  from  the  remainder  Rj  ^  with 
the  corresponding  factors  in  the  restoring  process.  Let  us 
consider  the  square  root  process  first. 

A.  Restoring  Method: 
ss  “Izaj 

^  (2^-j-2j2j - ^2aj_^g  +  ^2^-3-s-lj2 

(26) 


The  relation  between  successive  partial  roots  is 

S-l 


“J  +  Z  .  0  <  e  <  k-l. 

1*0 


Then 

=  -^2aj(l-2“®"^)  2^"^  (l-2“®+2“^®“^)^ 

B.  honrestoring  Method: 

Since  aj-1  «  aj  , 


(27) 
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(28) 


|2aj(l-2“®"^)  -  2  2^''^(l-2~®)  2“®“^ 

+  2^“^ (1-2“^®“^)  ^ 

Talcing  the  difference  betv/een  (27)  and  (28), 

pNH  _  pR  _  _22k-2j  g-as-l 

S  8 

As  v/as  expected,  equation  (29)  indicates  that  too  ouch  was 
subtracted  from  the  remainder  and  thus  the  indicated 

correction  must  be  added  to  the  normalized  negative  re- 
malnder  along  with  the  nev/  trial  factor  in  order  to 
achieve  the  desired  relation  P^  -  P?  =  0.  In  order  to 

8  S 

transform  the  correction  in  (29)  to  a  value  applicable  to 
the  modified  algorithm  of  equation  (25),  it  must  be  multi¬ 
plied  by  2^'*'®'*’^,  because  the  process  has  advanced  3+S4-2 
stages  since  its  beginning.  Thus, 

C®  s  _2^'‘’®'*‘^(p^  _  p^)  s  2^^“^“®'*’^,  Ois<k-l, 

2  88  (5QJ 

where  Cg  is  the  correction  that  must  be  added  to  the  norm¬ 
alized  negative  remainder  along  with  the  new  trial  factor 
after  a  normalizing  shift  of  length  s,  for  the  nonrestor¬ 
ing  binary  square  root  (ns  2)  process  with  normalized  re¬ 
mainders. 

It  has  turned  out  that  the  remainder  correction 
for  the  square  root  process  is  dependent  only  upon  a 
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single  bit  position,  and  not  upon  the  partial  root.  How¬ 
ever,  a  short  examination  reveals  that  the  correction  la 
more  complex  for  the  higher  rooting  processes.  For  the 
square  root  the  correction  is  a  zeroeth  order  polynomial 
in  the  partial  root,  for  the  cube  root  a  first  order  poly¬ 
nomial  in  BO 

Extensions  of  the  Method  to  Floating-Point  Operands 

The  binomial  theorem  method  developed  so  far  has 
been  used  for  extracting  the  integral  roots  of  binary  in¬ 
tegers,  and  is  naturally  extendable  to  fixed-point  numbezti 
of  finite  but  variable  precision,  since  the  only  differ¬ 
ence  between  the  two  is  the  arbitrary  placement  of  the 
binary  point.  The  method  nay  be  easily  extended  to  compute 
the  roots  of  floating-point  operands,  i.e.,  a  mantissa 
part  multiplied  by  a  power  of  the  radix,  by  altering  the 
mantissa  (or  fraction)  according  to  the  radix  exponent. 
Specifically,  let  us  consider  binary  floating-point  oper¬ 
ands  of  the  form  A  «  f-2^,  where  1/2  ^  f  <  1,  i.e.,  the 
operand  A  has  a  normalized  fractional  part  f.  Let  us  now 
examine  the  exponent  b.  V/hen  talcing  the  n  root  of  f*2  t 
we  must  form  b/n^  desiring  this  division  to  have  a  zero 
remainder*  Suppose  b/n  «  Int*{b/n}  ♦  r/n*  Then  if  we  take 
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where  b*  «  b+n-r,  0  5  r  <  n,  the  desired  rooting  can  be 
done.  Since  2"^  ^  f  <  It  the  altered  fraction  will  lie  in 
the  range  <  f  <  which  still  satisfies 

equation  (2). 

Additional  r.echanization  Requirements  for  the 
Nonrestoring  Method 

In  general,  scientific-type  computations  males  ex¬ 
tensive  use  of  the  floating-point  representation.  There¬ 
fore,  because  there  is  the  possibility  of  shifting  the  op¬ 
erand  fraction  as  many  as  n-1  positions  to  the  right  be- 
fore  performing  the  n  root,  this  number  of  positions 
must  be  added  onto  the  low  order  end  of  the  remainder  and 
trial  factor  registers,  in  order  to  retain  a  precision  of 
1  part  in  2  when  extracting  a  k-bit  root. 

i\n  additional  set  of  registers  must  be  provided  for 
the  formation  of  the  remainder  correction,  which  is  a  pol¬ 
ynomial  of  order  n-2  in  the  partial  root  extreme 

parallelism  is  used,  the  extraction  of  the  n  root  could 
utilize  the  partial  root,  remainder,  trial  factor,  and 
correction  registers,  and  2n-3  multipliers,  2n— 4  shifters, 
and  2  adders. 
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CHAPTER  III 


Design  and  Simulation  of  a  Binary  Square  Root 
Device  Employing  the  Binomial  Theorem  liethod 

The  fixed-point  nonrestoring  binary  square  root  al¬ 
gorithm  given  in  equations  (2-25)  and  (2-30)  nay  be  mech¬ 
anized  as  a  digital  macro-operation  in  much  the  same  man¬ 
ner  as  division.  For  the  sake  of  reference,  the  algorithm 
equations  are  reproduced  below  for  the  remainder  at  the 
iteration; 

2Jrj  «  2^Rj_3^  -  2^(2aj_^  +  2^"-'^}  .  (1) 

v/here  Rq  =  A,  and  the  post-normalizing  correction  is 

C®  »  22k-3-s+l  »  0  <  s  <  k-1  .  (2) 

Let  us  consider  the  binenry  operands  as  being  in  the  form 
A  «  2®.f  ,  (3) 

where  1/2  ^  f  <  1,  euid  E  has  positive  or  negative  values. 
As  a  particular  example,  let  the  floating-point  binary  op¬ 
erand  in  (3)  be  of  the  form  used  in  the  IBM  7090,  namely, 
a  27-blt  fractional  part,  an  8-blt  characteristic,  and  a 
sign  bit,  making  up  a  36-bit  binary  word.  In  the  IBU 
floating-point  format,  the  characteristic  is  formed  by 
adding  128  to  the  exponent  E,  thus  disallowing  negative 
characteristics  and  restricting  the  exponent  range  to 
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(-127 f  127).  Negative  exponents,  then,  are  represented 
symbolically  by  characteristics  in  the  range  (1,  127).  Ex¬ 
traction  of  the  square  root  of  such  an  operand  will  be 


SI 


89 


35 


8 


27 


sign^  characteristic 


fi'action 


Figure  3-1:  IBM  7090  Floating-Point  Binary 
Format. 

achieved  by  performing  a  fixed-point  binary  square  root 
upon  the  fraction  part,  and  halving  the  characteristic. 
However,  there  are  tv;o  cases  which  must  be  considered. 
Case  1:  E  Odd 


If  the  exponent  E  and  therefore  the  characteristic 
of  the  operand  is  odd,  the  fraction  peirt  f  must  be  multi¬ 
plied  by  1/2  (shifted  right  one  bit  position)  and  the  fix¬ 
ed-point  square  rooting  pi^Jcess  initiated.  The  character¬ 
istic  of  the  resulting  floating-point  square  root  is  form¬ 
ed  by  halving  the  operand  characteristic,  adding  one  to 
the  units  position  (bit  8),  and  then  adding  64  to  the  re¬ 
sult  to  form  the  correct  value.  The  above  method  is  form¬ 
ulated  as 

(2E  f)l/2  ^  ^  (4) 

Since  1/2  £  f  <  1,  then  1/4  <  it  <  1/2,  and  so 
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1/2  i  (if <1/^2^  ;  thus  the  fraction  part  of  the  square 
root  Is  normeillzed.  The  characteristic  of  the  square  root 
is  formed  according  to 

Int.{i(E  +  128)}  ♦  1  +  64  =  (Iat.(iE}  +  1)  +  128.  (5) 

Case  2;  E  even 

If  the  operand  characteristic  is  even,  i.e.,  it  has 
a  zero  in  its  units,  position,  then  the  characteristic  is 
simply  halved  and  64  added  to  it,  and  the  fixed-point  bin¬ 
ary  square  rooting  process  is  applied  to  the  unmodified 
fraction  part,  f.  Symbolically, 

(2®£)^/^s  2^®-f^/^  ,  and  (6) 

i(E  +  128)  +  64  «  iE  +  128  .  (7) 

A  straightforward  magnitude  analysis  of  the  remain¬ 
ders  in  the  rooting  algorithm  (1)  shows  that  if  the  ini¬ 
tial  remainder  Bq  (which  is  the  fractional  part  of  the  op¬ 
erand  Itself)  is  Inserted  into  a  27--bit  register,  an  extra 
bit  position  to  the  right  of  the  27  bits  is  needed  in  or¬ 
der  to  save  the  lowest-order  bit  of  the  operand.  This  will 
make  the  remainder  register  a  total  of  29  bits  plus  sign, 
and  the  trial  factor  register  has  one  less  bit,  or  a  total 
of  28  bits  plus  sign.  Now  let  us  combine  the  remainder  and 
trial  factor  registers  into  a  binary  accumulator,  the  re¬ 
mainder  register  being  the  accumulator  register,  emd  the 
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trial  factor  register  being  the  addend  or  subtrahend  reg> 
later,  dependlxig  upon  whether  the  accumxilator  is  the  add¬ 
ing  or  subtracting  type.  An  exaalnatlon  of  the  additive/ 
subtractive  processes  during  the  square  rooting  procedure 
reveals  that  only  three  cases  are  allowed: 

!)•  H'*‘  -  TF*  ^0  R“  «  positive  remainder 

2) ,  H'*’  -  Ti'"'’  <0  R”  “  negative  remainder 

3) .  R"  TP”*’  >  0  ip”  “  positive  trial  factor 

If  the  accumulator  is  made  a  binary  subtracting  accumula¬ 
tor  (with  an  accumulator  and  subtrahend  register) ,  then 
C(AC)  C(AC)  -  C(SU)  represents  its  operation  symbolical¬ 
ly.  Further,  let  negative  numbers  be  represented  in  I's 
complement  form,  and  let  the  sign  bit  be  0  for  positive,  1 
for  negative.  In  this  case  the  three  cases  become 


Case 

end-around 

borrow? 

1).  R*  -  TP*  >  0 

no 

2).  R*  -  TP*  <  0 

yes 

3).  R“  -  (-TP*)  >  0 

zio 

For  each  case  2  that  occurs  it  is  expected  that  a  case  3 
will  subsequently  occur,  unless  the  rooting  pz*ocess  is 
terminated  during  the  normalizing  shift  or  before  the  nor¬ 
malizing  shift  takes  place.  In  case  3  the  term  -TP*  is 
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represented  as  a  I's  oonplement.  In  the  I's  coaplement 
representation  of  negative  numbers ,  the  conplenent  digits 
are  Just  the  inverse  of  the  digits  in  the  true  representa¬ 
tion,  and  thus  leading  zeros  in  the  true  representation 
are  leading  ones  in  the  complement  representation.  There¬ 
fore  normalization  of  the  remainder  takes  place  either 
with  a  zero  (•♦•)  sign  bit  and  leading  zeros,  or  a  "1"  (-) 
sign  bit  and  leading  I's,  zeros  augmenting  the  partial 
root  in  the  former  case,  and  I's  in  the  latter.  A  charac¬ 
teristic  of  the  I's  complement  representation  is  the  oc¬ 
currence  of  an  end-around  borrow  (or  carry)  as  in  case  2. 
Using  suitable  borrow  look-ahead  circuitry  (such  as  in  the 
IBM  7090),  the  end-around  borrow  may  be  reckoned  along 
with  the  normal  borrows  that  occur.  Thus,  subtraction 
talces  a  fixed  minimum  time,  whether  the  end-around  borrow 
occurs  or  not.  Note  that  there  is  no  ambiguity  in  the  rep¬ 
resentation  of  the  quantity  "zero",  since  only  -0  occurs 
(case  1). 

Let  us  assume  that  our  accvunulator  automatlceilly 
adjusts  the  final  difference  left  one  bit  position  upon 
the  execution  of  each  subtraction  to  account  for  the  fac¬ 
tor  2^  in  the  algorithm  (1).  The  accumulator  register  must 
be  equipped  to  shift  left  or  right  one  bit  position  upon 
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the  reception  of  left  shift  or  right  shift  signals,  zeros 
being  introduced  into  the  positions  vacated.  When  the  noiv 
oallzed  remainder  is  negative,  both  the  I's  complement  of 
the  new  trial  factor  and  the  I's  complement  of  the  correc¬ 
tion  must  be  subtracted  from  it.  The  only  other  operations 
to  be  considered  in  the  fixed-point  square  root  are  the 
augmenting  of  the  partial  root,  formation  of  the  new  trial 
factor  from  the  partial  root,  and  the  formation  of  the  re¬ 
mainder  correction  bit.  Because  of  the  simple  relationship 
between  the  trial  factor  and  the  partial  root  (eqn. (1)), 
there  is  no  necessity  to  carry  the  pamrtial  root  in  a  sep¬ 
arate  register,  since  it  can  be  clearly  identified  as  euri 
extractable  part  of  the  trial  factor,  and  extracted  from 
the  trial  factor  register  at  the  end  of  the  rooting  pro¬ 
cess.  The  organization  of  the  fixed-point  square  rooter  is 
given  in  Figure  3-2,  The  logical  equations  for  the  various 
control  signals  emanating  from  the  local  control  are  given 
later  in  this  chapter.  The  local  control  directs  the  root¬ 
ing  proepsa  according  to  the  various  decisions  that  have 
to  be  made.  A  flow  chart  describing  the  square  rooting  se¬ 
quence  and  the  inherent  decisions  involved  is  given  in 
Figure  3-3.  In  the  flow  chart,  the  following  symbols  are 
used: 
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DGLIHE  «  digit  line  selector; 


TPR  »  trial  factor  register; 
REMR  =  remainder  register. 


Local 

Control 

Figure  5-2:  Organization  of  the  Fixed-Point 
Sq.\iare  Hooter. 

It  has  been  shown  that  when  the  remainder  becomes 
negative  in  the  nonrestoring  rooting  process,  a  correction 
must  be  added  to  the  reraainder  along  v/ith  the  next  trial 
factor.  Specifically,  the  post-normalizing  correction  for 

8  21C"" 

the  square  root  is  given  in  equation  (2)  as  C2*2  “  , 

0  ^  s  <  k-1,  where  s  is  the  number  of  normalizing  shifts 
made  during  the  iteration  in  question.  An  examination  of 
the  above  expression  reveals  that  it  is  exactly  the  bit 
position  corresponding  to  the  digit  line  that  is  enabled 
at  the  time  that  the  addition  of  the  trial  factor  and  the 
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Pig*  3-3*  Binary  Square  Root  Micro  Flow  Chart 
Mantissa  Fart* 
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(negative)  remainder  takes  place.  Therefore  it  is  possible 
to  consider  mechanization  of  the  subtraction  and  correc¬ 
tion  functions  in  parallel,  with  the  addition  of  the  digit 
lines  being  suppressed  when  the  remainder  is  positive, 
l.e.,  when  its  sign  bit  is  a  zero. 

Internal  States  and  Control  Logic  for  the  Fixed-Point 
Square  Rooter  ^  "  — — 

The  operation  of  the  binary  square  rooter  may  be 
given  in  a  state  table  which  describes  the  sequential  oom- 
putatlon  in  terms  of  the  states  of  a  covmter.  The  state 
table  is  given  in  Table  3“!*  The  three  basic  operations  in 
the  fixed-point  part  of  the  binary  square  root  are  sub¬ 
traction  of  the  trial  factor  from  the  remainder,  augment¬ 
ing  the  partial  root  after  the  subtraction,  and  simultan¬ 
eously  shifting  out  leading  zeros  and  further  augmenting 
the  partial  root.  The  basic  decisions  made  during  the  pro¬ 
cess  depend  upon  the  disposition  of  the  remainder,  trial 
factor,  digit  line  selector,  and  the  state  cotuiter.  The 
state  counter  counts  in  the  sequence  given  in  Table  3-1. 
There  is  another  counter,  the  digit  line  counter,  that 
changes  state  every  time  a  different  digit  line  is  to  be 
enabled.  This  counter  has  27  states,  and  thus  requires  5 
memory  elements.  \7e  shall  let  the  counter  be  26  (11010)2 
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state 
Coimter 
T1  T2 

0  0 


4 


1  1 


1  0 


0  1 


Table 


Operation 


Hxamlne  REI.31  S,A,1,2  and  po¬ 
sition  1  of  IXjLINE  selector. 


(S)(A)(1)(IX;LINEL)  shift  RElffi  A,  1-29  left  one 

position.  Sinultaneously 
present  AUGMENT  signal. 
Advance  to  state  10. 


(S)(A)(l)(2)(IX}LINEl) 


(S){(A)(1)}  Perform  subtraction.  If  S =  0 

-  EEUR  «  REMR  -  TER.  If  S®!, 

4  (S){(A)(1)(2)}  REMR  =  REMR  -  comp. (TEH)  - 
comp.  (IXrLINE) . 

Advance  to  state  11. 


Form  AUGMENT  signal. 
Advance  to  state  10. 


Advcmce  digit  line  selector. 
Advance  to  state  01. 


Sxsuaine  digit  line  counter: 
#0:  Advance  to  state  00; 
=  0:  End  operation. 


3-1 ;  Table  of  Basic  States  for  the  Execution  of 
the  Fixed-Point  Pait  of  the  Binary  Square 
Root. 


36 


to  enable  DGLIHE  1,  and  zero  (00000)  to  enable  LGLINE  27, 
the  intervening  states  beiiag  assigned  in  descending  order. 
V/hen  DGLIKE  1=1,  the  possible  shifting  out  of  a  leading 
zero  is  suppressed  (state  00).  The  Important  register  bit 
positions  are  the  remainder  S,  A,  1,  2,  as  shov.Ti  in  Figure 
2~2,  The  remainder  left  shift  one  bit-position  signals  are 
derived  as  follows: 

1) .  Remainder  Positive  (S  =  0): 

LEFT  SHIFT  -  ("s)  (T)  (T)  (DGLINE  1 ) (157) ('t2 ) 

SUBTRACT  =  (S){(T)(T)}  +  (DGLINE  l)('Tr)(’T2) 

2) .  Remainder  Negative  (S  =  l): 

LEFT  SHIFT  »  (S)  (A)  (1)(2)  (DGLINE  l)(Tr)('TF) 

SUBTRACT  =  (S){(A)(1)(2)}  +  (DGLINE  l)vTr)('T2) 

The  AUGUENT  signal  is  generated  during  states  00 

and  11,  and  is  derived  from  the  following: 

AUGUENT  =  (DGLINE  1)  (('s)(T)(T)  +  (3)  (A)  (1 )  (2 ))  ('^)  ("tF) 

+  (T1)(T2J  . 

The  digit  line  coxmter  nay  be  counted  dov.n  one  step  upon 
the  reception  of  the  AUGMENT  signal,  provided  that  there 
is  a  delay  in  the  change  of  state  so  that  the  original 
state  of  the  counter  may  be  interrogated. 

Recalling  that  the  trial  factor  is  given  by 
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2^(2aj_j^  +  2^”^)  in  equation  (1),  ita  format  at  a  given 
stage  is  .XZ***XX01|  where  the  X*s  (J-1  of  them  at  the 
stage)  represent  the  ”0"  represents  the  current 

root  bit  whioh  is  to  be  determined,  and  the  "1"  is  the 
term  2^^"^,  During  the  next  stage,  i.e.,  the  the 

trial  factor  has  j  X's  followed  by  a  zero  and  a  one.  Thus, 
augmenting  the  partial  root  and  forming  the  next  trial  fa¬ 
ctor  may  be  done  at  the  same  time  in  a  single  logical  op¬ 
eration,  as  illustrated  in  Figure  3**4*  The  logical  opera¬ 
tions  of  augmenting  the  partial  root  (contained  in  IFR) 
and  forming  the  new  trial  factor  are  given  by  the  follow¬ 
ing  equations: 

gTER^»  ( AUGMENT) ((’s‘)(l»LINE)^(T2)  -  (S)(IX}LIHB)j^(T2) 

(D0LINE)j^_2  } 

gTPH^  ■  (AUGMENT)  (DG1INB)j^_3^ 

Timing  Study  of  t^  Square  Root  Device 

Since  the  execution  time  of  the  square  rooting  de¬ 
vice  depends  upon  the  statistically  distributed  magnitudes 
cf  the  intermediate  remainders  in  the  square  rooting  pro¬ 
cess,  it  is  expected  that  the  execution  time  Itself  will 
possess  some  sort  of  statistical  distribution.  This  dist¬ 
ribution  is  vexy  difficult  to  obtain  by  any  method  other 
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Figure  3-4:  Sinulteueously  AugMnting  the  Pertial  Root  and  Forming  the  Trial 
Fa«tar« 


than  direct  experimental  slmulatloni  since  the  distribu¬ 
tion  of  the  remainder  magnitudes  depends  upon  the  previous 
remainders  and  the  partial  root  during  the  square  rooting 
process.  A  computer  program  for  the  lEi  7090  was  \?rltten 
to  simulate  the  operation  of  the  square  rooter,  thereby 
enabling  certain  characteristics  of  the  method  to  be  de¬ 
termined.  The  basic  format  of  the  numerical  experiments 
performed  Is  shown  below: 


Figure  3-9:  Basic  Format  of  Niuaerical  Experiments. 

The  simulation  experiments  v/ere  performed  upon  the  frac¬ 
tion  part  of  an  IBM  floating-point  operand,  since  this  Is 
the  part  of  the  process  which  is  of  me-jor  interest,  and  in 
fact  is  the  dominant  factor  in  the  execution  time.  The 
fraction  parts  of  the  floating-point  words  were  in  the  in¬ 
terval  (1/4,  1),  but  were  generated  in  the  interval  (1/2, 
l)  by  a  pseudo— random  number  generator.  A  flow  chart  of 
the  binary  square  root  simulation  program  is  given  in  Fig¬ 
ure  3-6.  The  symbolic  locations  given  in  the  flow  chart 
correspond  to  the  locations  in  the  program  listing  (see 
Appendix)  at  which  the  indicated  operations  occiur. 
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Initlallsa 
Loo.  RT(2)4>3 
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Fig,  Flow  Chart  for  Binary  Square  Root 

Simulation  Program,  I'^antlasa  Fart, 
PP.  123-129  »  143-148  • 
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Pseudo-Pftwdftw  Number  Generator 

The  pseudo-random  number  generator  used  In  the 
nxunorlcal  experiments  was  a  multiplicative  oongruentlal 
type  as  described  by  Rotenberg  [llj .  The  multiplicative 
cozigruence  algorithm  is 

*i+l  »  (2®  +  +  0  ,  Hod.  2^  ,  (8) 

where  a  is  a  real  Integer.  Rotenberg  applied  seversl  ei»« 
plrical  tests  to  the  above  algorithm  with  a*7»  C»lt  and 
p  a  35.  He  found  that  the  resulting  numbers  were  tmlformly 
distributed  and  that  there  was  no  detectable  serial  corre¬ 
lation  in  the  sequence.  The  cycle  structure  of  the  multi¬ 
plicative  congruence  method  has  been  determined  analyti¬ 
cally,  and  it  is  known  that  algorithm  (8)  can  generate  the 
full  period  of  2^  numbers  if  a  •  2  and  C  is  odd  [jLlj].  The 
serial  correlation  between  two  consecutive  niunbers  in  the 
sequence  has  been  shown  by  Coveyou  ^3^  to  be 


The  27-blt  pseudo-random  numbers  used  were  generated  in 
the  interval  (1/2,  1)  by  first  generating  a  26-bit  pseudo¬ 
random  number,  and  then  putting  a  "1”  in  front  of  it,  mak¬ 
ing  a  27-bit  niimber.  The  algorithm  parameters  used  in  (8) 
were  a*  11,  C«l,  and  pa 26,  and  the  resulting  serial  oor- 
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relation  between  two  successive  numbers  is,  from  (9) 


fW-  *1*1  > 


» 


The  Initial  random  number  in  octal  form,  was  232544614f 
but  other  runs  of  the  experiment  showed,  as  should  be  the 


case,  that  the  results  were  ixisensitlve  to  Xq  after  a 
reasonable  sequence  length  in  (8). 

Experiment  I;  Property  Distribution 


To  reveal  in  a  general  way  the  efficiency  of  the 
nonrestoring  square  root  method  with  normalized  remainders, 
the  previously  defined  figure  of  merit  "root  bits  per  it¬ 
eration"  was  obtained  as  a  function  of  the  magnitude  of 


the  operand  characteristic*  No  knowledge  was  assumed  con¬ 
cerning  the  nature  of  the  operands,  other  than  that  they 
belonged  to  the  class  of  all  properly  normalized  binary 
floating-point  operands  of  the  IBM  format.  Therefore  it 
was  assumed  that  the  operand  fractions  were  xmlfomiLy  dis¬ 
tributed  over  the  interval  (1^4#  !)•  It  something  more 


were  known  about  the  nature  of  the  operands,  it  might  be 
possible  to  restrict  the  interval  of  Interest,  and  in  gen¬ 
eral  entirely  different  conclusions  concerning  the  meth¬ 
od's  computational  efficiency  relative  to  the  sublnterval 
of  interest  covCLd  be  drawn.  As  an  additional  point  of  int- 
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erestf  the  average  number  of  corrections  per  operand  (27<> 
bit  fraction)  was  also  determined,  and  plotted  versus  the 
fraction  part.  For  the  experiment,  the  Interval  (1/4,  1) 
T/as  subdivided  Into  48  parts,  making  the  class  Interval 
equal  to  1/64.  The  results  were  averaged  within  each  lnter-> 
val,  since  only  the  trend  of  the  properties  In  question 
was  desired. 

The  results  are  shown  In  Figure  3-7.  It  Is  apparent 
that  there  Is  a  general  decrease  In  efficiency  and  henoe 
an  Increase  In  execution  time  as  the  magnitude  of  the  op¬ 
erand  fraction  Increases,  since  there  Is  a  decreasing  num¬ 
ber  of  root  bits  per  Iteration  being  obtained,  as  shown  In 
Figure  3'*7A.  The  irregularities  In  the  curve  are  due  to 
the  dependence  of  the  method's  speed  upon  the  patterns  of 
ones  and  zeros  In  the  root  Itself,  and  thus  are  difficult 
to  trace  hack  to  the  bit  arrangements  In  the  operand.  How¬ 
ever,  there  Is  a  definite  trend  shov/n,  and  the  minimum  av¬ 
erage  root  bits  per  Iteration  obtained  was  1.38  In  the 
subinterval  (63/64,  1),  the  maximum  was  2.70  In  the  subin¬ 
terval  (3A6,  21/64),  and  the  mean  value  was  1.91  root 
bits  per  iteration  in  the  entire  Interval.  The  minimum  and 
maximum  given,  of  course,  are  not  absolute,  sinoe  averag¬ 
ing  the  results  in  eaoh  class  Interval  "blunted"  these 
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Figure  5*‘7t  Properties  of  the  Nonrestoring  Square  Root 
Method  Using  Normalised  Remainders. 
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values.  Thus,  in  taking  the  square  root  of  the  fraction 
part  of  a  normalized  floating-point  binary  number  drawn  at 
random  from  the  population  of  all  numbers  of  this  type, 
the  expected  figure  of  merit  Is  about  1.91  root  bit  per 
iteration,  i.e.,  it  Is  expected  that  an  average  of  0.91 
root  bits  will  be  obtained  by  normalizli^g  the  remainder 
each  Iteration. 

In  the  development  of  the  nonreatorlng  binomial 
theorem  method  it  was  shown  that  the  remainder  must  be 
corrected  each  time  it  becomes  negative.  To  get  an  Idea  of 
how  many  times  this  occurs  on  the  average  per  operand,  the 
average  number  of  corrections  per  operand  was  measiired  In 
the  same  way  as  the  number  of  root  bits  per  iteration  was. 
The  results  are  given  In  Figure  3-7B.  The  measured  average 
minimum  was  about  0.05  corrections  per  operand,  the  maxi¬ 
mum  about  6.03,  and  the  mean  about  3.85* 

Experiment  II:  Timing  Distribution 

In  order  to  evaluate  the  performance  of  the  binomi¬ 
al  theorem  square  rooting  method  with  respect  to  execution 
time,  another  numerical  experiment  was  performed,  and  this 
time  the  total  execution  time  taken  to  operate  upon  a 
floating-point  binary  operand  v/ae  measured  In  terms  of  a 
defined  time  unit.  The  previously  discussed  device  using 
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the  1*8  complement  representation  for  negative  numbers  mas 
investigated  as  a  particular  example.  Throughout  the 
square  root  process  there  are  certain  time  costs  which 
must  be  "paid”  in  order  to  accomplish  the  various  func¬ 
tions  Involved.  These  time  costs  represent  different  phas¬ 
es  of  the  process,  and  were  chosen  as  modifiable  paramet¬ 
ers  which  Influenced  the  total  execution  time  of  the  pro¬ 
cess  in  varying  degrees.  The  following  parameters  were 
chosen: 

1) .  »  time  taken  to  execute  the  subtraction  of 

the  trial  factor  from  the  remainder; 

2) .  T  time  taken  to  augment  the  partial  root  and 

form  the  new  trial  factor;  and 

3} •  T  B  time  taken  to  shift  the  remainder  one  bit- 
8 

position  dvirlng  the  normalizing  shift,  all 

being  given  in  time  units. 

Thus  a  complete  iteration  will  take  ^ 

units,  s  being  the  nvunber  of  one  bit-position  normalizing 
shifts  made  during  the  iteration.  Only  the  fixed-point 
portion  of  the  square  rooting  process  was  simulated,  with 
the  operands  in  the  range  (l/4|  1).  Since  floating-point 
operands  are  being  considered,  there  is  an  additional  fix¬ 
ed  amount  of  time  associated  with  determining  whether  the 
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exponent  ie  odd  or  even.  This  would  nerely  shift  the  tim¬ 
ing  distributions  without  altering  their  essential  charac¬ 
ter.  It  was  assumed  that  sensing  whether  the  exponent  was 
odd  or  even  and  conditionally  shifting  the  operand  frac¬ 
tion  one  bit-position  to  the  right  could  be  done  in  the 
time  taken  to  perform  a  one  bit-position  shift,  and  this 
time  cost  was  accnied  whether  the  right  shift  occurred  or 
not.  In  performing  the  experiment  another  assumption  v;as 
made,  namely  that  in  the  course  of  examining  the  floating¬ 
point  exponents,  even  and  odd  exponents  occur  with  eq.ual 
frequency.  Accordingly,  then,  of  the  total  sample  of  frac¬ 
tion  parts  processed,  half  were  taken  in  the  range  (1/4,1) 
and  half  in  (1/2,  1). 

In  order  that  a  meaningful  distribution  be  obtain¬ 
ed,  it  was  important  that  sensible  or  typical  values  be 
assigned  to  the  parameters  T^,  €Uid  T^.  The  square 

rooting  process  consists  of  a  series  of  subtractions,  log¬ 
ical  operations,  and  one  bit-position  shifts,  and  there¬ 
fore  if  a  proper  relation  between  these  parametere  is 
used,  the  problem  will  be  resolved.  As  a  typical  example, 
the  execution  times  of  the  relevant  operations  in  the  IBU 
7090  arithmetic  unit  were  used  [6].  The  fixed-point  addi¬ 
tion  tedtes  3  clock  times,  whether  the  operands  possessed 
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like  or  unlike  signs.  Since  v/e  are  using  the  I's  coaple- 
ment  representation  for  negative  numbers  internal  to  the 
process,  no  additional  recomplenentation  time  is  required 
to  obtain  a  signed  magnitude  form  as  is  done  in  the  IBU 
7090.  It  may  be  desirable  in  certain  Instances,  hov/ever, 
to  recomplement  the  final  remainder  and  present  it  as  out¬ 
put  information  in  a  register  at  the  conclusion  of  the 
square  root  operation,  but  this  was  not  done  in  the  exper¬ 
iment.  One  single  bit-position  shift  in  the  IBU  7090  ar^ 
ithmetic  unit  is  performed  in  one  clock  time,  and  thus  the 
add-to-shlft  ratio  is  obtained.  Since  in  our  equipment  it 
was  postulated  that  the  logical  operations  of  augmenting 
the  partial  root  and  forming  the  new  trial  factor  could  be 
accomplished  simultaneously  in  the  time  required  to  per¬ 
form  a  one  bit- position  shift,  the  problem  cem  now  be  ful¬ 
ly  specified.  Therefore,  if  “  3  and  T^*»  1  time 

iinit,  the  parameters  (3»lti)  v/ill  describe  a  meaningful 
problem. 

The  probability  density  and  cumulative  distribution 
functions  for  this  problem  were  obtained  from  a  simulation 
program  for  the  IBU  7090  (see  Appendix),  and  are  displayed 
in  Figure  3-8.  2^^  operands  were  processed,  and  with  the 
parameters  used  no  operand  took  less  than  42  time  units  to 
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Figura  3*-8t  Statistical  Timing  Distributions  for  tha  Binary 
Square  Rootf  Binomial  Thaoram  Mathod|  1*s 
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execute,  and  none  more  than  108.  It  Is  seen  that  the  dist¬ 
ribution  of  execution  time  is  skewed  to  the  right,  and  for 
the  purposes  of  graphical  analysis,  l.e.,  to  detcznolne  the 
mean  and  variance,  it  is  convenient  to  make  a  transforma¬ 
tion  of  variables  such  that  a  function  ^(t)  of  the  execu¬ 
tion  time  t  becomes  normally  distributed.  Such  a  transfor¬ 
mation  is  ^$3 


<^(t)  = 


g(t)  -  g(yxj) 


(10) 


v/here  g(t)  Includes  no  unknovm  parameters.  The  cumulative 
distribution  function  for  execution  time,  when  plotted  as 
in  Figure  3-8,  gives  the  probability  that  a  reindomly-chos- 
en  operand  of  the  type  considered  will  tedce  more  than  (or 
less  than)  a  specified  number  of  time  units  to  have  its 
square  root  extracted  by  the  binomial  theorem  method.  The 
cumulative  distribution  is  plotted  on  a  normal  probability 
scale  in  Figure  3-9,  and  is  plainly  skew.  If,  however,  the 
ciuaulative  distribution  of  log-^^Qt  is  plotted  as  in  Figure 
3-10,  it  is  found  that  this  distribution  may  be  approxima¬ 
ted  by  a  straight  line,  and  thus  the  variable  (logj^Qt  - 
n 

logioA)/o^  is  approximately  normally  distributed,  where 
is  the  median  of  t  and  is  the  standard  deviation  of 
logiot.  From  Figure  3-10,  the  median  is  about  66  time 
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Figure  5-10 


\uxit3.  The  mean  is  given  by  10 v.'here  M  * 

log^O^  8  0*4?43«  To  compute  the  standard  deviation  of 
logj^o'tf  note  the  values  of  t  where  the  cumulative  distri¬ 
bution  is  equal  to  0.159  and  0.841;  these  values  are  37 
and  77  tine  units.  Taking  the  average  value,  * 'i'(lo6j^Q77 
-  logj^Q57)»  or  about  0.065.  The  mean  is  then  about  67 
time  units.  The  average  standard  deviation  of  t  is  iK77  - 
57)  f  or  about  10  time  units.  A  direct  computation  using 
the  experimental  data  yielded  a  sample  mean  of  68.8  time 
units  and  a  standard  deviation  of  10.6  time  units,  both 
values  being  verified  by  their  graphical  estimates. 

It  then  can  be  concluded  that  a  randomly-chosen 
floating-point  binary  operand  of  the  format  chosen  has  an 
expected  execution  time  of  about  69  tii:ie  units  with  stand¬ 
ard  deviation  10.6,  when  processed  by  a  square  rooter  of 
the  type  described,  a  time  unit  being  the  time  necessary 
to  perform  a  one  bit-position  shift.  The  minimum  execution 
time  is  42  time  \mlts,  eind  the  maximum  108,  on  the  order 
of  3.5  and  9  IBM  7090  machine  cycles,  respectively.  This 
compares  rather  favorably  with  the  67  cycles  needed  by  the 
SHARE  program  described  in  Chapter  I. 
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CHAPTER  lY 


Other  Rooting  Methods 

The  hinomlal  theorem  method  obvioiuJly  lent  itself 
to  direct  mechanization  of  the  square  root  operation.  In 
this  chapter  the  properties  of  other  n  rooting  proced¬ 
ures  will  be  considered,  to  provide  a  foundation  for  oon- 
parison  with  respect  to  mechanization  parameters, 

4-1 :  The  Euler  Iteration  Formulae 

In  a  recent  article  [ij]  ,  J,  P.  Traub  has  outlined 
a  method  for  generating  iteration  formulae  of  arbitrary 
order,  along  with  an  error  estimate.  The  following  devel¬ 
opment  is  essentially  his  as  given  in  his  paper. 

Let  us  start  by  deairir,g  a  real  root  of  the  func¬ 
tion  y=f(jc)  =  0  and  denote  this  root  as  oi  ,  so  that  f(ot)«« 
0,  The  only  assumption  that  is  made  is  that  OC  be  a  root  of 
multiplicity  one.  Given  the  inverse  relations 

y-f(x)  ,  X  «  g(y)  ,  (1) 

then  g(0)  =  g(y^  -  y^)  .  (2) 

Expanding  (2)  in  a  Taylor  series  gives 

where  the  parenthized  superscript  denotes  a  higher  deriva- 
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(4) 


tlve.  Since  (3)  reduces  to 


oc  «  f 


Defining 


u  *  f(x^)/£'(Xj^)  and 


(5) 

(6) 


(4)  takes  the  laore  compact  form 

00 

a  -  Xi  -  u  2.  •  (7) 

k«o 

If  we  then  talce  only  the  first  m<t-l  terms  of  the  series  in 
(7)t  and  denote  the  right  side  of  (7)  as  a  better  approxi¬ 
mation  to  OC  than  x^  (assuming  that  the  sequence  of  approx¬ 
imations  converges),  the  following  iteration  formula  is  a 
natural  consequence: 


♦n 

*141  - 

k«o 

Defining  the  Euler  polynomial  as 

YCu)  =  Yjc 

k-o 

transforms  (8)  into 

=^i+l  •  ^1  -  • 


(8) 


(9) 

(10) 


Defining 


f 


(11) 
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•  •  •  I 


Traub  8hov/8  that  l8  a  polynomi8l  in 
nhere 


\  "  ®2^-l  dx  • 

1  , 


8uoh  that 

Y^»  (1/2)1:^ 

Yg  (1/2)  -  (l/6)Dj  ,  etc. 


(12) 


(13) 


The  error  of  the  Iteration  formula  (10)  may  be  eatimated 
by  considering  the  error  oc  -  the  remainder  of 

the  truncated  series  in  (8) : 

CO 

«U1-  •  (14) 

♦m+i 

If  f(x)  is  a  smooth  curve  in  the  neighborhood  of  x  «0C|  we 
may  write 

f(«  +  £)s»  f(a)  +  €  f*(oc)  , 

where  €  is  a  small  error.  On  the  iteration,  x^«0l+€^, 
and  since  f(oi)  »  0,  f (x^)  «  6^f'(ot).  Since  f(x)  is  smooth, 
f'(x^)«  f'(^),  and  thus  the  error  nay  be  estimated  as 

f(x^)/f’(Xj^)  »  u  .  (15) 

V  V 

Thus  u  »  8^,  and  so 

■ 

Expanding  Yj^  In  a  power  series  about  X  ,  amd  assuming 
that  , 
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€ ^  (OC )  £  ^  >  m  * 0»1  j2, , . .  (16) 

Thus,  for  a  given  value  of  m,  an  iteration  formula  of  ord¬ 
er  m-f2  may  be  obtained  from  (8),  vd.th  error  estimate  (16). 

In  an  earlier  paper  [12J  Traub  compared  vEurioxxs  it¬ 
erative  methods  for  the  calculation  of  n^^  roots,  and  in¬ 
troduced  an  iterative  formula  which  he  called  "multiterm" 
iteration,  an  iteration  formula  which  may  be  derived  from 
the  Euler  formula.  Multiterm  iteration  considers  the  spec¬ 
ial  equation  f(x)  =  -  A,  where  f(ot)  =  0,  with 

OC  =  *  x(l  -  f/x^)^/“  .  (17) 

Letting  v  =  -f/xf',  OC «»  x(l  +  nv)^^*^,  or 


Noting  that  v  =  -u/x, 

OC  *  X  +  X  2,  (^k^)  ^ 

k«i 

Using  f(x)  as  given  above, 

«  f(^)/fi  s  (n-l)(n-2)—*  (n-k-»-l)x“^‘*’^  .  (20) 

Comparing  (19)  with  (7),  using  (20)  gives 

Y.  »  (n-l)(2n-l)---*(lcn  -  l)x”V(k+l).',  lc-0,1,2,... 

^  (21) 

for  this  special  case.  Multiterm  iteration  may  be  made  any 

order  by  considering  only  part  of  the  Infinite  series  in 
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(18).  Speolfically,  the  iteration  fornula  of  order  m  la 


where 


"1+1 


+  X 


xi:-. 


(22) 


l»2t3i» . » 


(23) 


The  upper  hound  on  the  error  is 


Trauh  points  out  that  the  multiterm  iteration  formula  may 
he  apid.ied  in  a  sequence  such  that  the  order  of  each  suc¬ 
ceeding  application  may  or  may  not  be  changed,  until  the 
root  has  been  computed  to  the  desired  precision. 

Rational  Approximations  to  the  Ehiler  Polynomial 

In  his  i>aper,  Traub  also  considers  rational  approx^ 
legations  to  the  Euler  polynomial  of  a  form  due  to  Fade. 
V/ritten  this  way, 


where 


Y(u)  W 

P{u)/Q(u)  , 

(25) 

P(U)  « 

A 

(26) 

Q(u)  » 

o 

(27) 

nay  be  written 

^+1  -  ^i 

-  uP(u)/Q(u) 

(28) 

Writing  (7)  as  0L»  -  uY(u)  -  E  , 


(29) 
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Where  subtracting  (28)  fron  (29)  gives 

^  -  u{p(u)/Q(u)  -  Y(u)J 

or  -  uH(u)/Q(u)  +  E  ,  where 


H(u)  *  P(u)  -  Y(u)Q(u)  =  X  .  (30) 


Referring  to  (30),  if  the  leading  term  of  H(u)/Q(u)  is 
proportional  to  then  analogous  to  (16),  the  itera¬ 

tion  formula  (28)  is  of  order  m+2.  Thus  Traub  chooses  the 
p+q+1  parameters  Pj^,  Qj^  so  that  =  0,  kB0,l,2,...,  p+q, 

vd.th  p+qem.  To  do  this,  equate  like  powers  of  u  in  (30), 
using  the  series  in  (26)  and  (27).  Traub  gives  the  result¬ 
ing  equation 

Vrp-  S-Vr-lc'O  >  (51) 

k«o 

where  f  1  r  <  p 


w 


(32) 


\.  0  r  >  p 

and  s  =  min,  (r,  q)  .  (33) 

Thus  (31)  can  be  used  to  find  the  Pj^  and  Qj^  recursively, 
since  the  Yj^  £u:e  known  (eqn.  (13)),  and  Pq  *  !•  Traub  then 


gives  the  corresponding  error  formula 

6  (Y  -  H  ) 

^i+1 “m+l'^i 


(34) 


wliich  indicates  an  iterative  formula  of  order  m+2,  where 


The  iterative  fornula  (27)  niay  then  he  v.Titten  in  the  coia- 
pact  form 


where  )  is  defined  as 

P‘1  i 


(36) 


(x^)  a  X 


i 


P*0yly2|ia*|m 

q»0,l,2,...,m  (37) 
p+q  ■  m  . 


Equation  (36)  then  defines  m+l  iterative  formulae,  a  few 


of  which  are  summarized  below: 


1).  m  s  0 


^00  “  * 


£  *  Y  £ 

i+1  1  i 


(38) 


2),  m  *  1  : 

®  X  —  u(l  +  Y^u)  ;  ^  i+1  *  ^2  ^i 

^01  “  *  “  1  '  ^i+1  “  ^^2  "  ^i 


(39) 

(40) 


3) .  m  “  2  : 


IgQ  =  X  -  u(l  +  Y^n  +  YgU^)  }  «  Y^fiJ 


(41) 


^1“ 


X  -  u 


Yi  +  u(Y^  -  Yg) 
\  -  ^2^ 


Y  Y  —  Y^ 

.  6  -  ^2  ,4 

»  *=i+l  Y^  '^i 


(42) 


l02  =  X  - 


_ u 

1  -  Yj^u  +  (Y^  -  Y2)u^ 


^i+l“  ^^3  “  2YiY^  ♦  Y,^)e 


1*2  *1'  i 


(43) 


In  the  above  formulae,  x  =  x^,  ajid  in  the  error  estimates 
the  Yjj  are  evaluated  at  the  n  root  X  .  The  formulae  Ijjo 
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are  those  which  result  from  equation  (10) ,  the  iteration 
formula  hefore  the  Fade  approximation  was  applied.  For  the 
particular  example  f(x)  *  x*^  -  A,  Traub  indicates  that  the 
formulae  of  the  form  are  preferable  from  the  stand¬ 
point  of  error  estimate.  A  remark  by  Kogbetliantz  [^9^  also 
states  that  rational  approximations  of  this  form  are  the 
most  useful. 

Specialization  to  the  Extraction  of  n*^  Roots 

In  order  to  apply  the  above  methods  to  the  extrac¬ 
tion  of  integral  roots,  the  particular  equation  f(x)  =  x*^ 
-  A  must  be  considered.  The  for  any  particular  n  are 
given  in  equation  (21),  and  the  first  few  are 


*  (n  -  l)/2x 
Yg  =  (2n^  -  3n  +  l)/6x^ 

Yj  =  {Sr?  -  lln^  +  6n  -  l)/24x^  ,  etc. 


(44) 


Also, 


u  =  i- 

fl 


-n+1 

5-- — {j^  -  A) 


(45) 


Using  (44)  and  (45)  to  write  out  the  first  few  iteration 
formulae  gives 


^00 

€ 


=  5  [(>'-1)='  - 


(46) 

(47) 
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£  «  _ r  3n  4  1  ^3  <  l//^2  .  o\  £  5 

i+l  :::2 - ^i^jCAn  -6n  +  2)£^ 


6vr 


I..  • 

L(n-l)x^  -  (n-l)Aj 


'•01 


-  ^  1  tf  3  <lf  2  .  3 

'i+l  “  3'^  ”  1} 

^  120C  ^  ^  ^ 


‘•20  “  ^0  “ 


2n_=_3S_+_i 

6r^ 


I  - 


..n 


(48) 

(49) 

(50) 

(51) 

(52) 


€  .  T  ±  ef  <  \{er?  -  lln^  +  6n 

i+l  240?  ^  5 

-l)€j  (53) 


Ill=  xU 


(7n  -  l)x^  -  (n  -  1)A 


xV  (2n  -  2)x^  -  (4n  -  2)A 


(54) 


•i+l 


2nl-_ng,-^n^  ^4  ^  .  „2  .  jn 

72X^  ^  ^ 

+  !)€£  (55) 


(511*^  4  5n  ■>•  l)x^^  +  (Sn*^  -  5n  -  2)x^  +  (1  -  n^)A‘ 
{3r?  +  6n  +  l)x^^  +  (8n^  -  6n  -  2)x^  +  (1  -  n^)A^J 

(56) 


2x  .2' 


I  s®  X  ' 
02 


■i+l 


_  +  n  »  2  £  4  <  1,„3 


24et^ 


£  J  5  ^(n^  +  n  +  2)  ej 


(57) 


As  is  expected,  the  iterative  formulae  become  more  compli¬ 


es 


cated  as  their  order  Increases,  and  hicher  order  fornulae 
may  be  derived  from  an  extension  of  (44)  and  from  (45), 

4-2 :  The  Fade  Table  of  Rational  Approximations 

This  method  enables  a  general  pov7er  series,  whether 
convergent  or  divergent,  to  be  approximated  by  a  rational 
function  of  the  form  R  =  P_(x)/Q^(x) ,  where 

jTB  a  8 

a 

^r^^^  “  •  (58) 

0 

5 

Qa(x)  “  1  4  51  *  (55) 

I 

V/e  desire  the  approximation 

DO 

f(x)  >=  51  ^  “  Py(x)/Qg(x)  ,  (60) 

0 

and  if  the  definition 

CO 

Qa(x)  y  (61) 

o  o 

is  imposed,  the  coefficients  a^^  and  b^  may  be  found  from 
the  resulting  linear  system  of  r+s+l  equations.  In  gener¬ 
al,  the  accuraoy  of  the  approximation  Kpg(3c)  Increases  as 

the  degree  of  P_(x)  and  Q  (x)  increases.  According  to  E. 
r  8 

G.  Kogbetllantz  [[9]  ,  the  entries  In  the  r  by  s  table 
which  are  the  most  useful  are  those  for  which  r«a  or  t» 
S'fl  •  If  r «  8 ,  then  clq  Cq  ,  and 


66 


(62) 


^  b  0  ,  =  0 

r  8-r+l 


# 

i 


\  *  21'‘r®i_r  *  i“  1,2,3,...,  s  , 
h*o 

(63) 

and 

S 

*  21V2s+k^l-r  ’  lc«0,l,2,... 

(64) 

h»o 


The  decrease  extremely  rapidly,  and  thus 

oo 

Therefore  as  a  rough  estimate  (ras), 


Furthermore,  Qy(x)  1,  and  thus 

Ej,(x)  «  Vq  .  (66) 

Since  the  range  of  x  and  the  order  r  are  presumed  to  be 
knovm,  a  rough  estimate  of  the  error  may  be  obtained  by 
computing  If  0  5  x  <  Xq  , 

l®rl-KoK”^  • 

Yq  is  obtained  by  solving  the  system  of  r+l  equations 

(62)  and  (64)  with  r- s,  lc*0  : 

5 

21Vs-n.i“°'  i«1.2,3,... 

h»o 
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and 


This  yields 


5 

^0  *  ^VzsU-r 

^0  '  V^r  ,  where 


A 


r 


°1  Cg  ... 

Cl  ...  Op 

®2  ®3  **'  °i>2 

°2  ®3  **’  °r+l 

•  • 

;  Sj.- 

•  •  • 

#  ^  0 

^  # 
m  m 

•  •  m 

®rfl  °r+2  •••  °2r+l 

°r  °r+l  ®2r 

(68) 


Sy  being  the  principal  minor  of  A  ^  .  The  approxj.mation 
R _ (x)  e  P  (x)/Q  (x)  may  be  written  as  a  continued  frac- 

XT  T  T 

tlon 


(69) 


and  the  coefficients  Aj^,  may  be  found  by  combining  and 
cross-multiplying  (69).  An  exaraination  of  (69)  shows  that 
parallel  computation  enables  R^^ (x)  to  be  formed  in  r  div¬ 
isions  and  r-fl  additions. 

Specialization  to  Root 

Kogbetllantz  treats  this  problem  by  considering  the 
approximation  in  a  general  interval  (b,  c)  using  the  sub¬ 
stitution  X  »  a(l  +  z),  v/here  b  <  a  <  e.  Then 
^1/^(1  ^  is  ej^anded  into  a  binomial  scries 
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00 


Zf/-)  .>/- .. ,  |z|  £  1,  and  a  Pade  approximation  for¬ 
med.  To  further  restrict  the  range  of  z,  let  b  “  a(l  - 
and  c  =  a(l  f  Tg)!  0  <  rj^<  1,  0  <  rg  <  1,  ao  that  ^ 


2  Tg  ,  v/hlch  still  satisfies  |z|  <  1.  Let 


00 

Kr-> 


Y(z)=2.\ 


(70) 


As  k  gets  large,  the  ratio  approaches  -1,  and 

thus  the  series  may  be  approximated  by  an  alternating  geo¬ 
metric  series  which  has  a  known  sum.  Therefore 

Y(z)  ^3  Yq/(1  +  z)  .  (71) 

Then  the  error  fomvila  (65)  may  be  written 

Yq 

(1  +  z)  Qjzi  • 

The  relative  error,  E^(z)/x  '  ,  is 

2r+l 


^0“ 


(72) 


Letting  E^(z)  *  Ktp(z)  where  K  =  constant,  it  has  been 
found  that  the  extrema  of  the  relative  error  lie  at  z  « -r^ 
and  z  =  rg  .  Equating  the  absolute  value  of  the  relative 
error  at  these  values  of  z  gives  |E^(-rj^)|  *  |Ej,(rg)|  . 
Vi'ritten  out. 
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(73) 


2r+l 


2r+l 


(1-  '  (1  t  9^(rj)  ' 

The  ratio  c/b  gives  a  second  equation  involving  and  Pg* 
c/b  =  (1  4.  r2)/(l  -  Pj^)  ,  (74) 

nhere  c/b  is  a  known  constant  since  the  interval  (b,o)  has 
been  specified.  Solving  (73)  and  (74)  yields  the  desired 
values  r^  and  r^t  so  that  the  maxinun  relative  error  and 
the  constant  a  nay  be  computed.  The  constants  a^,  a^,  agf 
which  are  functions  of  a,  are  then  computed,  and  then 
a  continued  fraction  'representation  may  be  obtained  of  the 
form 


.V  _A_ 

^  Z  +  4- 

k«i  *• 


(75) 


Substituting  z  =  (x  -  a)/a  into  (75)  gives  the  desired  ap¬ 
proximation  to  In  his  article  Kogbetliantz  gives  se¬ 

cond  order  (r«2)  resiilts  for  the  square  root,  n«2  i 

1/2^  5/70  50 /to/49  4/49^  _ 

*  14  "  X447/14  4-  ’  xT3/14 

0.25  4  X  <  0.5  .  |Eg|  <  10"^  , 

,1/2  ..  5n/35  _  200  /F/49  16/.4^ 

*  7  x+47/7  4>  ’  X  +  3/7 

0.5  <  X  <  1  ,  lEgl  <  10"5  . 

The  accuracy  of  this  type  of  approxlnation  can  be  improved 
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either  by  using  higher  order  rational  approxi nations  or  by 
decreasing  the  size  of  the  intei'val  in  v/hich  the  approxima¬ 
tion  is  valid.  Prom  the  standpoint  of  computing  time  the 
latter  is  preferable,  although  it  results  in  more  storcige 
apace  being  required. 

The  simplest,  though  not  the  most  accurate  rational 
approximation  which  is  a  function  of  the  operand  Is 

f(x)  V  «  -y-— ,  X  e  x{z)  ,  (76) 


v/hich  can  be  computed  in  one  multiplication,  one  addition, 
and  one  division.  In  order  to  use  this  approach  to  extract 
integral  roots,  let  us  consider  the  function  f(x)  = 
n  =  2,3,4,>«>,  where  x  s  a(l  f  z) ,  |z|  <  1,  As  before. 


1/n 


CO 

z 


V 


1/n 


«•)  ■ 


and 


oo 


to 


(1  +  \z)  XI ®1^^  • 

0  0  (77) 

Solving  (77) » 


With  k«  0, 


\  “  ns2,3,4,.,,  , 

®kt3  ^°k+2  '  '^k  '  lc-0,1,2,,,, 

Vq  “  °3  ^  h.°2  » 

Y  s  -l/n(n^  -  ll ,  n»2,3,4,... 

0  V  12n5  J 


(78) 

(79) 


(80) 
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Considering  the  approximation  in  the  interval  (b,  o)  as 
before,  with  b  =  a(l  -  r^^),  c  =  a(l  4-  rg),  equating  the 
absolute  value  of  the  relative  error  at  z  =  -r^^  and  z  »  rg 
gives,  since  V  (z)  «  ^q/(1  ♦  z)t 


where  a  may  be  computed  once  r^  is  known. 

Choice  of  Interval 

Since  the  order  of  the  rational  approximation  has 
been  fixed,  the  only  way  that  its  precision  can  be  varied 
is  by  varying  the  end  points  of  the  interval  of  approxima¬ 
tion  (b,  o).  In  general  it  is  true  that  the  precision  of 
the  approximation  increases  if  the  interval  length  c  -  b 
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decreases.  let  us  deal  with  fixed-point  binary  operands  in 

the  range  (2“*^,  1),  and  pairtition  this  range  into  2^(2’^-!) 

oubintervals  of  equal  length  so  that  these  subintervals 

may  be  easily  identified  by  logical  circuitry.  A  computa- 

tion  was  made  using  the  Interval  (2  »  1),  subdivided  into 

24  subintervals.  It  v/as  found  that  the  greatest  relative 

error  occurred  in  the  lowest  subinterval,  for  which  o/b 

9/8.  This  is  not  surprising,  since  in  the  lowest  subinter- 

x/n 

val  X  '  has  its  greatest  curvature,  thus  causing  the 
greatest  inaccuracy.  A  calculation  of  the  worst  relative 
error  in  the  subinterval  (2"^,  2~^  f  2”^“^)  has  been  made 
for  the  square,  cube,  and  fourth  roots  (n  =  2,3,4,  respect¬ 
ively),  for  varying  numbers  of  subintervals.  The  restilts 
are  summarized  in  Table  4-2. 

Altho\;igh  the  operand  is  partitioned  into  2^(2*^  -  1) 
logically  identifiable  subintervals  (listed  as  "maximum 
number  of  intervals"  in  Table  4-2),  it  is  apparent  that 
all  of  these  need  net  be  distinguished  from  one  another. 
For  example,  consider  the  square  root  being  tedeen  in  the 
range  (l/4,  1)  using  3  subintervals  (l/4,  l/2) ,  (l/2,3/4), 
and  (3/4,  1).  The  maximum  relative  error  is  a  monotonical- 
ly  decreasing  function  of  the  lowest  subinterval's  end 
point  ratio  c/b,  and  thus  the  above  3  subintervals  can  be 
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reduced  to  2,  (l/4,  1/2)  and  (1/2,  1),  without  exceeding 
the  maximum  relative  error  in  the  lowest  suhinterval  (1/4, 
1/2).  Simileir  reductions  can  be  made  concerning  the  other 
entries  in  Table  4-2,  and  these  appeeu*  as  "minimum  number 
of  intenrals"  in  Table  4-2.  Por  first  order  Fade  approxi¬ 
mations  three  stored  constants  are  required  for  each  in¬ 
terval,  whether  the  ratio  of  polynomials  or  continued 
fraction  representation  is  used. 

If  the  problem  in  question  is  the  computation  of 

the  n^“  root  of  a  27-bit  binary  integer  to  an  absolute 

27 

precision  of  1  part  in  2  (fraction  part  of  IBU  7090 

floating-point  word),  then  since  the  n^^  root  lies  in  the 

range  (1/2,  1),  the  maximum  relative  error  is  2”  or  ap- 

—8 

proximately  1.49*10  .  Por  the  square  root  this  corre¬ 

sponds  to  the  entry  65/64  in  Table  4-2.  Por  this  relative 
error,  then,  the  size  of  the  table  of  stored  constants  re¬ 
quired  for  each  order  root  may  be  determined.  These  table 
sizes  are  given  in  Table  4-I(. 
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B 

No.  of 
Stored 
Constcmts 

Maximum 

Relative 

Error 

2 

354 

1.47-10“® 

3 

498 

1.16  •  10"® 

4 

639 

0.92-10“® 

5 

774 

0.75  •  10“® 

6 

915 

0.61  •  10“® 

7 

1050 

0.55-10“® 

Table  4-3:  Size  of  Stored  Constant  Tables  for  the 
Square  Through  Seventh  Roots,  First 
Order  Fade  Approximation. 

4-3:  Extensions  of  Kadler's  Method 

M.  Nadler  [?*  s]  has  outlined  an  Iterative  method 
published  by  FLower  in  1771 1  which  was  first  used  to  com¬ 
pute  high  precision  logaurithms,  but  which  is  also  useful 
in  computing  the  reciprocal  or  the  integral  roots  of  a 
given  number.  If  we  are  given  the  luimber  A,  we  may  find 
its  reciprocal  by  multiplying  it  by  a  series  of  constants 
such  that 

aTToi-*'!  •  (84) 

Dividing  (84)  by  A  yields  the  equation  that  is  necessary 
to  compute  the  reciprocal  of  A, 

TToi--A"^  .  (85) 
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Thus  (84)  and  (63) t  computed  separately,  form  a  pair  of 
Iterative  equations  that  yield  the  reciprocal  of  a  given 
number.  These  equations  na^  be  used  to  find  the  quotient 
B/A  by  using  the  pair  of  equations 
A  TT 0^-*  1 


B  TT  BA”^ 


(86) 


A  modification  of  this  algorithm  has  been  used  for  dlvls-> 
ion  in  the  Harvard  Mark  IV  computer,  and  is  given  by  Rich¬ 
ards  '[jLO]  as 

"l*!  -  “l'®! 

'’l.l  "  -  »l>“l  ’ 

v;here  is  the  dividend  and  the  divisor.  The  Iterative 
method  in  (8?)  will  converge  if  0  <  Dq<  1,  thus  making 

1,  i-  0,1,2,... 

The  Iterative  method  described  in  (84)  and  (83)  may 
be  extended  to  the  computation  of  n^  roots  by  employing 
the  following  extension,  developed  by  Nadler  [8]  to  ex¬ 
tract  the  square  root  of  a  number.  Let  the  following  pro¬ 
duct  be  formed  in  a  given  register: 

aTTcJ-I  .  (88) 

Raising  (88)  to  the  power  (n-l)/n  gives 

A(*^"l)/n  JJcJ-l- 1  .  (89) 
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(90) 


lJultiplying  (89)  by  then  given 

aTToJ-^- . 

and  thus  the  pair  of  equations  (88)  and  (90),  computed 
separately,  form  an  iterative  algorithm  which  may  be  em¬ 
ployed  to  extract  the  n^^  root. of  a  given  number. 
Computational  Conalderations 

Nadler  points  out  that  the  constants  c^^  may  be  of 
the  convenient  (in  the  binary  number  system)  form  1  'd:  2“^, 
p>l,2,3f*..,  so  that  multiplication  may  be  carried  out 
using  a  shift  and  an  addition.  Bichards  discusses  the  Har¬ 
vard  Llcurk  lY  division  algorithm  in  the  decimal  system 
where  the  same  sort  of  approximation  is  used,  i.e.,  2  - 
%  1  'f  d^,  where  d^  is  the  highest  order  nonzero  digit  of 
1  -  D^.  Suppose  that  A  JJ  1  monotonically  from  below, 
and  thus  Cj^  is  of  the  form  1  +  2“^.  After  a  few  Iterations 
the  process  will  reach  a  point  where  A*[T®j^  will  be  of  the 
form  0.1111***,  such  that  each  succeeding  iteration  will 
merely  add  another  "1"  to  the  string  already  obtained. 

Thus  if  k  significant  digits  of  the  quotient  are  desired, 
nearly  that  many  shift-addition  operations  will  be  requir¬ 
ed. 

Let  us  examine  the  precision  of  these  iterative 
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methods  t 

1) .  Divlaloa 

aTTo^-*-  1 

TTo^-^  A"^  -  Q 
A  TT  «  1  -  A  , 

then 

rrCi=  Q{1  -  A  )  ,  (91) 

and  therefore  the  relative  error  of  the  reciprocal  (or 
quotient)  is  the  same  as  that  of  the  operation  which  caus¬ 
es  the  reciprocal  to  be  formed. 

2) .  n^  Roots 

aTToJ-i 

A  TToJ-^-  A^/“  «  oc 

aTToJ-i-a  . 

Raise  to  the  power  (n-l)/n, 

^Cn-l)/n  ^  ^n-1  .  ^  ^  ,(n-l)/n  ^ 

Since  A  ■  <C  1 , 

^(n-l)/n  ^  n-1  ^  ^  ^  ^  . 

X  u 

Multiply  by  A^'^“  *»  OC  , 

aTTcJ'^-sss  OC  1^1  -  ~Aj  ,  (92) 
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and  thus  the  relative  error  of  the  root  Is  less  than 

the  relative  error  of  the  forcing  expression.  Therefore  if 

th 

the  desired  precision  of  the  n  root  is  specified,  the 
precision  to  which  the  forcing  expression  must  be  carried 
out  can  be  determined. 

In  the  case  of  the  n  rooting  algorithms  given  in 
equations  (88)  and  (90),  the  form  c^  *  1  +  2“^  poses  some 
problems.  The  relation  between  c^  and  o^"*  must  be  exact 
or  to  within  the  luaxlmum  tolerance  of  the  rooting  proced¬ 
ure  in  order  that  the  n^^  root  thus  extracted  be  correct 
to  the  specified  precision.  Richards  states  that  it  is  de¬ 
sirable  to  nial:e  the  capacity  of  the  registers  holding  the 
factors  in  question  one  or  two  digits  greater  than  the 
word  length  of  the  reciprocal  (or  root)  in  order  to  mini¬ 
mize  the  effect  of  round-off  errors.  In  the  case  of  the 
square  root  (n'*2),  the  problem  may  be  handled  in  the  fol¬ 
lowing  manner: 

»**-»  2 

Let  a  partial  result  be  given  as  ATTCii 

i 

this  result  be  \ised  to  determine  the  next  mifLtiplying  con¬ 
stant  c^  =  1  +  2“**,  p  >  1.  Row  if  p  is  large  enou^, 

D 

C  -  (1  +  ^  I  ■¥  2"P"^  , 

IB 

th\w  giving  o|  -  1  +  2“**  +  2’*^*’"^.  Therefore  the  factor 
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•f-r  2 

aTTc^  could  be  used  to  deteraine  the  squares  of  the  mult¬ 
iplying  constants,  and  thvis  the  constants  thenselves,  both 
in  an  exact  manner.  There  is  one  complication  ttiat  night 
arise  in  the  application  of  the  above  method,  however, 
namely  that  ATTcj^  >  1*  This  may  be  remedied  by  taking 
4*  1  ±  4  2’2P-2^  1  i  2-P,  using  c^  »  1  -  2“P 

when  A  TT  c^  >  1  and  c^^^ «  1  +  2”P  when  A  c^  <  1 ,  When 
AllCj^  ^  process  terminates  because  an  exact  root 

to  within  the  process  tolerance  has  been  found.  The  oon- 
2 

stants  c^  and  c^  imply  shift-addition  operations,  and  may 
be  utilised  in  the  same  manner  as  in  the  division  process. 
If  k  significant  digits  are  to  be  computed  in  the  square 
root  and  S  additional  digits  are  carried  along  in  the 
computation  to  counter  round-off  error,  then  the  effect  of 
2“2P“2  vanishes  when  2p+2  >  k+S,  or  p  >  i(k  +  S  -  2),  ap¬ 
proximately  the  midpoint  of  the  iterative  process,  and  the 
simpler  approximation  c^  «  1  ±  2”^  may  be  used  thereafter. 

For  the  cube  root  (n«3)i  the  approximation  to  the 
cube  of  the  constant  may  be  written  c^  a  (1  *  2“^^)^  *» 

1  ±  (2“P  f  2'P“^)  f  (2”^P"^  +  2-2p-2)  ^  but  this 

approach  is  rather  impractical,  since  the  approximation  Cj^ 

must  be  obtained  from  A  7T  c^  -►  1 ,  and  then  an  exact  corre- 

3  2 

opondence  betv/een  c^  and  c^  must  be  established  in  order 
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that  the  iterative  process  be  valid.  It  is  easily  seen 
that  for  this  type  of  approximation  defies 

simple  mechanization,  since  an  exact  correspondence  must 
be  established  between  c^  and  after  first  obtaining 

an  approximation  of  the  form  Cj^  “  1  2“^^  from  the  fac¬ 

tor  aTT  1» 

Stored  Tables  of  Constants 

Instead  of  forming  the  constants  c^  at  each  stage 
of  the  iterative  procedure,  we  could  examine  the  magnitude 
of  ATT  0^,  and  upon  the  resvilts  of  this  examination,  se- 
lect  the  appropriate  oonsteuats  o"  and  0^  from  stored  ta¬ 
bles.  The  determination  of  the  magnitude  of  Affo^  could 
be  made  by  direct  logical  access  to  its  bit  positions,  emd 
thus  the  appropriate  table  entries  could  be  selected  ac¬ 
cording  to  the  bit  configuration  sensed.  If  k  bits  of  ac¬ 
curacy  are  desired  in  the  n^^  root,  l.e.,  A^^*^  i  Ot  (1-2”^) , 
then  according  to  (92), 

aTToJwI  -  S^.2-^  .  (93) 

For  example,  let  us  consider  extracting  the  n^^  root  of  a 
k-blt  binary  integer  in  the  range  (2'*'^,  1)  with  absolute 
error  less  than  or  equal  to  1  part  in  2  .  If  it  is  desired 

MM  n 

to  force  AlTOj^  into  the  desired  z'ange,  i.e.. 
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f 


(94) 


1  -  ATToJ  <  1  +  ~-2"^ 

u  i  n 

using  just  one  nultiplication,  then  2^"^  +  2^“^  +  ...  «. 
entries  each  are  required  in  the  and  tables,  mak¬ 
ing  a  total  of  2^"*'^  -  stored  constants  required, 

Hov/ever,  since  aTTcJ  v/ill  be  in  the  desired  range  after 
one  multiplication,  the  c^  table  does  not  have  to  be  stor- 
ed  in  this  special  case  since  the  desired  root  oC  ft?  Ac 
may  be  obtained  directly  from  the  c  table.  If  this  is 
the  case,  about  233  million  stored  constants  would  be  re¬ 
quired  to  extract  the  square  root  of  a  27-bit  binary  inte¬ 
ger  (such  as  the  fraction  part  of  an  IBM  floating-point 
word)  in  one  nultiplication,  about  252  million  to  extract 
the  cube  root,  and  even  more  for  the  higher  roots.  These 
figures  are  of  course  entirely  out  of  the  question.  The 
nxuaber  of  stored  constants  required  to  force  A  TT  c^  into 
the  desired  range  may  be  reduced  by  expending  more  multi¬ 
plications,  but  the  c^  will  have  to  be  stored,  and  it  will 
require  the  expenditure  of  many  multiplications  in  order 
to  reduce  the  stored  tables  to  a  reasonable  size. 

4-4;  Truncated  Series  Method 

Suppose  it  is  desired  to  compute  the  value  of  a 
function  that  has  a  convergent  power  series  representation 
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2 

f(x)  *  bQ  f  +  bgX  +  ....  ,  and  suppose  further  that  it 
is  possible  to  mke  a  transforaation  on  f(x)  so  that  It 
nay  be  approximated  by  a  severely  truncated  series,  say, 
f(x)  c  bp  bj^x.  It  is  this  type  of  transfornation  which 
will  be  considered  in  the  computation  of  the  real 
root  of  a  real  number. 

The  binomial  expansion 

(1  t  A  )’•/“-  1  ♦  t  It  +•••  ‘95) 

is  an  alternating  power  series  convergent  for  |A|<1.  Let 
us  suppose  that  |A|  «  1  so  that 

(1  +  A  1  +1  ,  (96) 

the  eri'or  being  less  than  the  next  term,  l.e., 

|6|<^a“.  n.2.3,4....  (97) 

2n 

Let  it  be  stipulated  that  our  operands  are  binary  integers 

and  that  we  wish  to  compute  their  n^^  root  to  an  accuracy 

of  at  least  1  part  in  2^,  i.e.,  |€|<  2”^,  Thus 

p-k  X  xi-1  .  2 
2n‘^ 

or 

A  <  2-1^2  .  (98) 


84 


For  exanple,  if  our  operands  are  IBIA  7090  floating-point 
words  with  27-bit  fractional  parts,  then  k»27  and  the 
maximum  A  is  given  in  the  table  below. 


n 

Maximum 

A 

n 

Maximum 

A 

u 

Maximum 

A 

1 

-12 

1.00-2 

6 

1.34-2“^^ 

10 

1.66-2"^^ 

1.06-2“^^ 

B 

1.43-2"^^ 

11 

1.74-2“^^ 

1.15-2‘’^^ 

H 

1.51- 2“^^ 

12 

-12 

1.81-2 

H 

1.25-2'’^^ 

■ 

1.59-2“^^ 

13 

1.87-2"^^ 

Table  4-4:  Maximum  Value  of  A  in  the  Truncated 
Series,  k*27. 

For  the  values  of  n  shown,  A  *  2*"  is  a  satisfactory 
value  to  vise.  If  we  then  force  our  operand  into  the  range 
(1,1^2  ),  the  series  given  in  (96)  niay  be  used  to 

compute  the  n^^  root  of  x  to  within  the  naxlnium  allowable 
error. 

Transformation  of  the  Qperemd 

Considering  that  we  are  operating  upon  the  27-blt 
fractional  part  of  IBM  7090  floating-point  words,  it  is 
given  that  the  operand  will  be  in  the  range  2“^  <  x  <  1, 
n*2,3t4f.*  It  is  required  to  execute  some  sort  of 
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tranafornation  upon  the  operand  x  in  order  to  force  it 
into  the  interval  (1|  1*  2^^), 

Let  us  consider  a  tranafornation  used  by  Bener  M 
and  by  Cantor,  Estrin,  and  Turn  W  in  the  conputation  of 
the  lot;arithn  of  a  real  number.  Let 


VM 

z  «  xTTc. 

I 


(99) 


define  a  transformation  upon  x.  Then 


In  z  •  In  xTfc^  =  In  X  +  °i  » 

'  I 


and  thus 


In  X  «  In  z 


-  'll  °i  * 


(100) 


The  series  expansion  for  In  z  about  the  point  z  s  i  is 

in  z  «  (z-1)  -  |(z-l)2  +  i(z-l)5 - ,  (101) 

convergent  for  0<z<2.  Ifz-l  +  A  ,  where  A«  1, 
then 

ln(l  +  A  )  «  A  +  O(A^)  ,  (102) 


vath  error 

k|<|A^  .  (103) 

Thus  if  |z-l|  1  |a|  by  applying  the  transformation  given 
in  (99),  In  x  may  bo  computed  using  (100),  which  employs 
the  severely  truncated  series  in  (102).  The  additional  re¬ 
quirement  is,  of  course,  that  a  suitable  table  of  con- 
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stanta  In  be  available,  as  well  as  the  meems  for  ex¬ 
tracting  the  correct  entries  from  this  stored  table.  Can¬ 
tor,  Eotrin,  and  Turn  specified  an  error  bound  of 
and  thus  A  <  They  operated  upon  a  normalized 

(1/2  <  X  <  1)  27-bit  binary  operand  with  the  transforiMt- 
tion  (99)  using  tv/o  multiplications  and  two  stored  tables 
of  constants  to  force  the  operand  into  the  range  1  -  2"^^ 
<  z  <  1  +  2“^^,  The  transformation  was  defined  as  z 
Sj^o^x,  where 


-  2-‘  Int 


k  =  Int.(2'x) 


and 


c^  »  2'^^  Int.  ■!  2^^  ^ 


J  e  Int.(2^^aj^) 


Int.(  )  denotes  the  intecer  part  of  the  quantity  in  brack- 
6  7 

ets.  Therefore  2  <  k  <  2  ,  i.e, ,  k  =  64,65 ,127,  and 
2^3  -  2*^  -  2®  <  j  <  2^^,  i.e.,  J  «  8000,..., 81 91.  Thus 


there  are  64  constants  aj^  and  192  constants  c^  required  to 
transform  1/2  <  x  <  1  into  1  -  2“^^  <  z  <  1  +  2“^^,  where 


z  -  aj^OjX. 

In  a  similar  manner,  then,  let  iis  define  a  trans- 
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formation  that  will  force  2"“  £  x  <  1  into  1  -  A  <  2  < 


1 P  _ 

1  +  A  ,  where  A»  2  and  z  »  x  XT  I<et 


then 


Therefore 


xTTci 

I  ^ 


2““  <  X  <  1  , 


(104) 


^l/n  „  ^1/n  1/n 

I  1 


TT 


n-«r 


(105) 


where  1-2“  <z<l  +  2“  ,  and  thus  may  be  com¬ 

puted  using  the  series  in  (96)*  with  l€.l^  2“^*^,  Consider 
effecting  the  transformation  (104)  in  a  single  multiplica¬ 
tion,  z  s  xa^^.  In  order  to  bring  z  into  the  desired  range, 
the  first  15  bits  of  x  must  be  examined.  Let  k  —  Int(2^^x) 
where  2“^<  x  <  1,  and  thus  2^^"^  k  <  2^^,  n  32,5*4,  •••. 

Por  each  of  the  aj^  we  need  an  to  correct  thus 

*l>/n 

necessitating  two  tables,  and  a^  '  .  Table  4-5  gives 
the  total  number  of  stored  constants  required  in  the  sin¬ 
gle  multiplication  scheme. 


Total  no. 
of  const. 


Total  no. 
of  const. 


Total  no. 
of  const. 


Table  4-5:  Number  of  Stored  Constants  Required  for 
nth  Hoot,  Single  Multiplication  Scheme. 
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Hote  that  the  constants  a^^  have  a  snail  number  of  nonzero 
bits,  and  thus  if  a^^  is  considered  as  the  multiplier,  the 
computation  of  z  «  xSj^  is  a  "short"  multiplication.  If 
n  >  13,  either  more  leading  bits  of  x  will  have  to  be  ex¬ 
amined,  necessitating  expansion  of  the  stored  tables,  or 
an  additional  multiplication  will  have  to  be  executed,  al¬ 
so  introducing  additional  constants.  The  present  discus¬ 
sion  will  be  limited  to  the  cases  where  n  is  not  large 
enovigh  to  require  such  changes. 

In  order  to  reduce  the  number  of  stored  constants 
required,  let  us  consider  forcing  z  into  the  desired  range 
using  two  multiplications,  i.e.,  z  •=  xSj^Cj.  Following  Ceui- 
tor,  Estrin,  and  Turn,  let  the  transformation  sequence  be 
(2"“,  1)  -  (1  -  2"^,  1  +  2“^)  (1  -  2“^^,  1  +  2“^^),  the 

respective  ranges  of  x,  xSj^,  xSj^Cj.  Define 


k  »  Int.(2^x)  , 


and 


j  «  Int.(2^^xaj^) 

The  reuiges  of  k  and  j  are  2^”'’' <  k<  2^,  n“2,3t4,5f 


(106) 

(107) 

(108) 
(109) 
and 
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2^^  -  2^  -  2^  ^  j  <  2^^.  Thus  there  are  no  more  than  62 

constants  for  n  <  6,  and  96  constants  c^.  In  addition 

to  these  constants,  there  nust  be  tables  of 
■•1  /n. 

c~  stored.  Table  4-6  ^ives  the  total  number  of  con- 
J 

stants  required  in  the  two  multiplication  scheme  for 


values  of  n  between  2  and  5.  If  n  >  5  the  a^^  and  a^^' 


-1/n 


n 

Total  no. 
of  const. 

n 

Total  no. 
of  const. 

2 

288 

4 

312 

3 

304 

5 

316 

Table  4-6:  Total  Humber  of  Constants  Required  for 
nth  Root,  Two  I’.ultipli cation  Scheme. 


tables  v/ill  have  to  be  expanded,  with  a  resultant  reduc¬ 
tion  in  the  size  of  the  c.  and  tables.  The  raultipli- 

^  ^  -1/n 

cations  xe^  and  xSj^c^  are  "short"  and  za^^  '  and 
are  regular  length. 

It  should  be  noted  that  this  "sequential  table 
lookup",  abbreviated  STL,  method  as  Cantor,  Estrin,  and 
Turn  call  it,  is  quite  similar  to  Hadler's  method  for  com¬ 
puting  roots,  in  that  they  both  force  the  operand  into  a 
predetermined  range.  However,  tlie  difference  betv;een  the 
two  methods  is  the  width  of  this  reuige.  In  Kadler's  method 
the  operand  has  to  be  forced  into  such  a  narrow  range  that 
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either  too  larce  a  table  of  stored  constants  or  an  unsat¬ 
isfactory  nunber  of  nultiplications  is  required. 

4-5:  LoRarlthn-Antilofrarithm  Approach  to  n^^  Rooting 

If  it  is  required  to  extract  the  n^^  root  of  a  giv¬ 
en  real  nunber,  the  following  sequence  of  operations  may 
be  performed: 


Figure  4-1:  Computational  Sequence  for  the  Ipg- 
Antilog  Method. 

The  operation  e*  is,  of  course,  the  eintilog  operation  cor¬ 
responding  to  In  X. 

Let  us  exeunine  a  veuriable  structure  computer  devel¬ 
oped  by  Cantor,  Estrin,  eind  Turn  [2]  that  computes  the  el¬ 
ementary  functions  In  x  euad  e*.  The  essential  character  of 
their  sequential  table  lookup  (STL)  algorithm  has  been 
given  in  the  section  (4-4)  dealing  with  the  truncated  ser- 
ies  method  for  computing  n  roots.  Cantor,  Estrin,  and 
Turn  developed  a  combined  structure  that  handles  both  In  x 
and  e  as  well  as  separate  structures,  and  it  is  this  com¬ 
bined  structure  whose  characteristics  will  be  given. 

The  constants  necesseury  to  compute  In  x  and  e* 
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are  stored  in  a  table  of  1057  words  of  ninimum  length  31 
bits  and  maxloun  length  44  bits.  In  addition,  a  36-blt 
accunulator,  a  35-bit  adder,  a  36-bit  nultiplicand  regis¬ 
ter,  and  a  14-bit  UQ  register  are  computational  registers 
required.  Besides  the  necessary  menory  access  hardware  re¬ 
quired  to  select  the  desired  constants  from  memory,  there 
is  also  the  usual  control  and  decoding  circuitry  that  is 
necessary  to  make  the  process  function. 
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OJ^PTEK  V 


Comparison  of  the  Rooting  Methods 
5-1 !  Tlmin^t  Measures 

Each  rooting  method  considered  Is  made  up  of  a 
number  of  olementary  arithmetic  and  logical  operations. 
However,  each  method  does  not  necessarily  consist  of  the 
same  operations,  and  the  operations  occur  In  varying  pro¬ 
portions  according  to  the  method.  Therefore,  as  a  first 
step,  the  timing  evaluations  will  be  made  In  terms  of  the 
elementary  operations.  The  operations  used  are  defined 
as  fixed-point  binary,  vfith  a  fixed  word  length.  Let  the 
following  symbols  be  Introduced:: 

S  *  one  bit-position  shift} 

A « addition  or  subtraction; 

M = f ull  word-length  multiplication; 

D  *  division; 

FA*  memory  access; 

Mg =  short  multiplication,  where  a  short  multipli¬ 
cation  la  one  v;hoae  multiplier  Is  substant¬ 
ially  shorter  than  the  full  word  length. 

5-2 :  Dealing  with  the  Floating-Point  Exponent 

It  was  previously  pointed  out  that  the  fractional 
part  of  a  floating-point  operand  may  bo  shifted  as  many 
as  n-1  bit  positions  to  the  right  before  execution  of  a 
fixed  point  rooting  process,  depending  upon  how  nearly 
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the  exponent  was  a  multiple  of  n.  For  a  general  value  of 
n,  the  only  way  to  determine  this  property  Is  to  perform 
the  division  b/n,  where  b  Is  the  exponent,  examine  the 
remainder  r  (b/n«Int,  {b/n^  +  r/n),  and  shift  the  frac¬ 

tion  part  n-r  places  to  the  right  If  r  la  nonzero.  The 
root  exponent  la  Int,  {b/n}  +  1  If  r  >  0  and  b/n  If  raO. 
For  an  IBM  floating  point  v/ord,  the  division  b/n  la  a 
maximum  of  8  bits  long,  and  thus  the  maximum  time  taken 
to  deal  with  the  exponent  Is  this  8-blt  division  plus  n-1 
one  bit  position  shifts.  Therefore,  this  time  must  be 
added  onto  the  maximum  expected  execution  times  of  those 
methods  which  employ  operations  on  Just  the  fractional 
parts  of  a  floating-point  word.  These  methods  are  the 
binomial  theorem  method,  the  Euler  Iteration  formulae, 
the'  truncated  series  method,  and  the  Fade  approximation. 
5-35  The  Binomial  Theorem  Method 

A  sub-unit  of  the  binomial  theorem  n^^  rooting 
process,  an  Iteration,  has  been  previously  defined  ass 
1).  formation  of  the  trial  factor; 

2}.  formation  of  the  correction  if  the  remainder  Is 

negative; 

3) .  additlon/subtractlon  of  the  trial  factor  and  correc¬ 

tion  to  the  remainder; 

4) ,  shifting  out  leading  zeros  from  the  new  remainder; 

and 
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5).  augmentlns  the  partial  root  with  the  appropriate 
bits  according  to  the  results  of  steps  3  and  4. 

Ah  Iteration  Is  represented  schematically  In  Figure  5-1. 
The  most  tlme-consumlns  part  of  the  Iteration  occurs  In 
forming  the  trial  factor  and  the  correction  at  the  begin¬ 
ning  of  the  Iteration.  For  the  n^^  root,  the  trial  fact¬ 
or  Is  a  polynomial  of  degree  n-1  In  the  partial  root 
®J-1»  correction  Is  a  polynomial  of  degree  n-2 

In  the  coefficients  being  the  binomial  coefficients 

multiplied  by  a  power  of  2  In  the  case  of  the  trial  fact¬ 
or  and  Integers  of  approximately  the  same  magnitude  as 
the  binomial  coefficients  multiplied  by  a  power  of  2  In 
the  case  of  the  correction.  Since  the  trial  factor  la  a 
higher  degree  polynomial  than  the  correction,  the  forma¬ 
tion  of  the  trial  factor  la  the  longer  operation  of  the 
two.  IrVhat  Is  required,  then.  Is  to  form  successively  the 
powers  of  from  the  square  to  the  (n-1)®^,  and  form 

the  trial  factor  and  correction  polynomials  using  the 
appropriate  coefficients, 

A  highly  parallel  method  of  doing  this  la  shown 
In  figure  5-2.  The  trial  factor  la  represented  symbol¬ 
ically  as  CQ  +  oj^aj,]^  +•••+  ^n-l^jli*  correction 

as  Cq  +  claj.;^  +  °n-2®jl!*  ''^®*'®  Oo»  ®1*  —  •  °n-l| 

I  •  • 

°0»  ®  !*•••  »  ®n-2  ®^®  Integers  times  a  power  of  2, 

Assuming  the  posltlonlngs  can  be  accomplished  In  one  or  a 
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Pigvire  5-1:  Schematic  Representation  of  an  Iteration. 
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TRIAL  FACTOR  CORRECTION 
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few  one  bit-position  shift  times,  the  entire  process  of 
formlns  the  trial  factor  and  correction  can  be  done  In 
the  time  it  takes  to  form  the  n-2  powers  of  plus 

the  time  taken  to  form  the  last  term  of  the  trial  factor. 
Done  in  this  way,  the  arithmetic  units  which  might  be 
used  for  the  formation  of  the  trial  factor  and  the  corre> 
tlon  are  3  multipliers,  2  multiple  place  shifting  ma¬ 
trices,  and  2  adders.  During  the  early  stages  of  the 
rooting  process  the  partial  root  consists  of  only  a 

few  digits,  and  near  the  end  consists  of  nearly  the  full 
word  length.  Thus,  the  n-2  multiplications  used  to  form 
the  powers  of  have  multipliers  with  an  expected 

length  of  one-half  the  full  wor^  length,  and  therefore 
are,  on  the  average,  short  multiplications. 

If  more  conservatively,  a  single  arithmetic  unit 
is  used,  assuming  also  that  one  shifting  matrix  Is  avail¬ 
able  to  execute  the  various  variable  length  shifts  re¬ 
quired,  the  computation  of  the  trial  factor  and  correct¬ 
ion  polynomials  takes  3n-5  short  multiplications,  2n-3 
additions,  and  2n-3  variable  length  shifts,  for  n  *  3, 
5»*»*  .  To  obtain  a  maximum  time  estimate,  the  mini¬ 
mum  figure  of  merit  of  1.00  root  bits  per  iteration 
could  be  essumed,  and  thus  n^^  rooting  process  could 
take  as  many  as  k  iterations  (k  being  the  number  of  bits 
in  the  fraction  part  of  the  floating-point  word),  each 
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Iteratlsn  consisting  of  3n-5  short  nultlpllcatlons ,  2n-3 
variable  length  shifts,  1  bit-position  shift  time  to  aug¬ 
ment  the  partial  root,  and  2n-2  additions.  To  extract  ths 
n^^  root  of  a  floating-point  binary  operand,  then  it  will 
take  a  maximum  of  k^(3n-5)Mg  +  (2n-3)S*  +  3  4*  (2n-2)A^  + 

(n-l)S  -f  D(8),  where  S*  Is  a  variable  length  shift  exe¬ 
cuted  by  a  shifting  matrix  and  D(8)  Is  an  8-blt  division, 
for  n»3,  4,  5»  •••  •  If  a  shifting  matrix  is  not 
employed,  the  rooting  process  for  n>2  becomes  extremely 
time  consuming  due  to  the  large  number  of  sequential  one 
blt-poaltlon  shifts  needed  to  position  the  terms  of  the 
trial  factor  and  correction.  The  square  root  (n«2)  has 
been  treated  as  a  special  case  In  Chapter  III. 

5-4 :  The  Euler  Iteration  Formulae 

The  computational  speeds  of  the  Euler  Iteration 
formulae  depend  upon  their  order  (and  thus  complexity), 
and  upon  the  number  of  times  they  must  be  applied.  Since 
the  number  of  applications  (or  Iterations)  depends  upon 
the  precision  desired  and  the  order  of  the  root  desired, 
timing  evaluations  will  be  made  on  a  “per  Iteration" 
basis  enl  Iterations  may  be  cascaded  to  meet  the  computa¬ 
tional  needs  of  particular  problems. 

The  first  six  Euler  Iteration  formulae,  l,e., 
those  described  earlier,  will  be  considered.  Table  5-1 
gives  the  execution  time  of  one  iteration,  ■  ^pq. 
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using  sequential  computation  v/lth  a  single  arithmetic 
unit.  All  operations  are  fixed-point  binary,  and  "red 
tape"  and  data  transfer  operations  are  neglected.  Also, 
the  time  taken  to  deal  with  the  floating-point  exponent 
Is  not  Included  In  the  timing  table.  Table  5-1  was  com¬ 
piled  for  a  fixed  n,  l,e.,  all  the  expressions  containing 
n  were  precomputed  and  assumed  available  at  the  time  they 
were  required. 


Table  5-1* 


Computational 
Properties  of  the 
First  Six  Euler 
Formulae ,  Sequen¬ 
tial  Com.putatlon 
Using  One  Arith¬ 
metic  Unit,  n 
Fixed, 

*pj»No,  of  mult, 
used  to  form 
powers  of  x. 


If  n  becomes  substantially  large,  the  computation 
of  x*’  takes  the  major  portion  of  the  Iteration  computa¬ 
tion  time.  Therefore,  there  Is  a  point  at  which  the  com¬ 
putation  of  a  single  Euler  Iteration  becomes  more  time 
consuming  than  using  another  method  to  compute  the  n^^ 
root,  and  thus  the  computation  of  x*'  enters  as  a  limiting 

factor  In  the  usefulness  of  the  Euler  Iteration  formulae, 
0 

The  Fade  Arrroximation  Method 


Approx. 

A 

M 

Ms 

D 

P* 

M 

O 

o 

1 

n-1 

0 

1 

n-2 

^10 

3 

n+3 

0 

1 

n-1 

^01 

2 

2n-l 

4 

1 

2n-2 

^20 

4 

n+6 

0 

1 

n-1 

^11 

4 

n+1 

4 

i 

2 

n-1 

^02 

4 

1 

2nO 

6 

1 

2n-2 
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The  first-order  rational  aprroxlmatlons  considered 


could  take  two  equivalent  fonno,  either 


1). 


^l/n  ao  4-  ®1* 

A>  S.  '  f 

1  ♦  b]^x 

xl/n  a  Aq  +  - 

X  + 


However,  even  though  the  two  representations  yield  equal 
results  to  the  desired  precision,  they  are  not  computa¬ 
tional  equals.  Sequential  computation  of  the  first  rep¬ 
resentation  (ratio  of  polynomials)  takes  2M  +  2A’  +  ID 
+  3MA  +  (n-l)S  •►D(8),  and  the  second  (continued  frac¬ 
tion)  2A  +  ID  +  3MA  +  (n-l)3  +  D(8).  Clearly  the  con¬ 
tinued  fraction  representation  is  preferable  timewise, 
the  execution  times  given  being  those  for  a  floating¬ 
point  operand, 

5-6:  Re  lection  of  Nedler*  a  Method 

Although  they  are  theoretically  sound,  the  higher 
order  extensions  of  Nadler's  method  for  calculating  n^^ 
roots  present  unreasonable  demands  in  storage  (such  as 
several  million  stored  constants  being  required  in  a 
sequential  table  lookup  scheme),  or  are  grossly  incon¬ 
venient  or  Imr^osslble  to  mechanize  as  In  the  case  of  the 
blt-by-blt  method  of  forcing  the  factor  A  J]" o”  to  unity, 
because  of  the  exact  relationship  demanded  between  c^ 
and  cV“^  . 
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The  similarity  between  Kadler's  method  and  the 
truncated  series  method  points  up  the  superiority  of  the 
latter-  as  far  as  the  number  of  stored  constants  required, 
since  In  the  truncated  series  method  the  quantity  being 
forced  to  unity  does  not  have  to  approach  this  value  as 
closely  as  In  Nadler*s  method,  and  although  more  arith¬ 
metic  operations  are  expended,  the  number  of  stored  con¬ 
stants  required  for  the  sequential  table  lookup  approach 
In  the  truncated  series  method  Is  far  less. 

Therefore,  Nadler's  method  la  regarded  as  grossly 
undesirable  In  view  of  the  much  simpler  and  more  effic¬ 
ient  n^^  rooting  methods  available,  and  will  be  elimi¬ 
nated  from  further  consideration, 

5-7 !  The  Truncated  Series  Method 

By  applying  the  transformation  m  «  xTJe^  to  the 

4* 

operand  x  in  order  to  force  z  Into  the  range 
(1  -  IaI,  1  ♦  |A|  ),  It  v;a3  shown  that  z^/’^  could  be  com¬ 
puted  using  the  severely  truncated  series  z^/*^  as  1  +  A/n, 
where  |A|  was  chosen  to  satisfy  an  error  criterion.  The 
transformation  w^s  accomplished  In  essentially  the  number 
of  short  multiplications  necessary  to  force  z  Into  the 
desired  range,  and  an  equal  number  of  "correcting"  full 
word-length  multiplications  were  applied  to  z^/”  In 
order  to  obtain  xl/". 

The  computational  sequence  Is  given  In  Figure  5-2 , 
The  two  previously  discussed  transformations  were  the 
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slnsle-  and  two-multl “llcntlon  types,  applied  in  the  case 
where  1^1:^  to  satisfy  l6.|  $  2"^'^.  Since  has  28 

significant  bits  In  the  case  of  an  IBM  floating-point 
binary  word,  27  of  them  to  the  right  of  the  binary  point, 
and  since  |A|  <  2"^^,  the  division  ^/n  need  only  be 
carried  out  14  places  at  the  most,  depending  upon  the 
value  of  n. 


Figure  5-2 :  Computational  Sequence  of  the 

Truncated  Series  Method,  Mantissa  Fart, 

Thus  ^/n  Is  a  "short"  division,  and  for  the  sake  of  arg¬ 
ument  will  be  considered  as  one-half  a  full  word-length 
division.  Another  point  arises,  namely,  whether  A  la 
positive  or  negative.  If  A>  0,  we  need  only  consider 
that  part  of  which  lies  to  the  right  of  the  binary 

point  In  the  division  A/n.  If  A<  o,  however,  the 
division  |Al/n  must  bo  performed  and  the  sum  1  -  |A|/n 
formed.  This  Implies  two  subtraction  ooeratlcns,  end  It 
will  be  assumed  that  these  must  have  token  place  In  order 
to  create  a  worst-case  example, 

1) ,  Single  multiplication,  z  «  xa^^  : 

maximum  execution  time  »  1M_  +■  IM  +  2A  +  (l/2)D  +2MA 

s 

2) .  Two  multiplications,  z  =  * 

maximum  execution  time  =  2Mg  +  2M  +  2A  +(l/2)D  +  4KA. 
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5-8  :  The  Locc -Exponential  Method 

The  In  x  and  e*  functions  mechanized  In  the  vari¬ 
able  structure  computer  of  Cantor,  Estrln,  and  Turn 
operate  upon  IBM  7090  floatlns-polnt  words  (8-blt  expon¬ 
ent,  27-blt  fraction,  and  sign)  and  It  Is  for  such  oper¬ 
ands  that  the  execution  times  will  be  given.  Two  timings 
are  given,  one  for  maximum  parallelism  and  the  other  for 
a  sequential  computation. 

1) .  In  X  : 

Parallel  =  IMA  4  2K  +  lA  +  IN  ; 

3 

Sequential  =  2KX  +  2M  +  3^^  +  IN  ; 

2) .  a-  : 

Parallel  -  1C  +  IMA  +  2Mg  +  3A  +  IN  ; 

Sequential  =  1C  3MA  +  2M„  4  4A  4  IN, 

s 

where  N  -  normalization  and  C  =  conversion.  The  normal¬ 
ization  and  conversion  consist  of  a  controlled 
sequence  of  one  blt-poaltlon  shifts.  The  normalization 
takes  a  minimum  of  0  and  a  maximum  of  27  shifts,  and  the 
conversion  a  minimum  of  0  and  a  maximum  of  26  shifts.  It 
Is  seen  that  the  difference  between  the  parallel  and 
sequential  computations  for  In  x  is  IMA  4  2A,  and  for 
e*  ,  21'A  4  lA.  In  order  to  determine  the  total  time 
needed  to  compute  xVn, 

the  Individual  computations  must 
be  cascaded  Into  the  sequence  shown  In  Figure  4-1,  Since 

the  difference  In  computation  time  between  the 
log-exponential  em-loylng  parallel  and  sequential  In  x 
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and  e*  Is  only  3MA  +  JA,  the  soquentlal  methods  will  bo 
considered.  These  are  the  algorithms  executed  by  the 
variable  structure  computer  designed  by  Cantor,  Estrln, 
and  Turn.  The  total  computation  time  for  the  log-expo- 
nentlal  root  Is  a  maximum  of  5KA  +  AM_  +  7A  +  803, 

O 

This  time  Is  for  the  combined  In  x-o*  structure 
employing  1037  stored  constants. 
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CHAFTER  VI 


Conclusion 

The  component  terms  in  the  maximum  expected  execu¬ 
tion  tlcoa,  In  terms  of  the  basic  arithmetic  and  loslcal 
operations  previously  set  forth,  are  given  In  Table  6-1 
for  the  v.'orkable  n"^^  rooting  methods. 

In  some  instances  it  may  be  advantaseous  to  com¬ 
bine  two  of  the  previously  described  methods  in  a  sequen¬ 
tial  fashion  to  obtain  an  advantage  in  speed.  One  such 
examcle  is  the  use  of  the  Euler  Iteration  formulae  plus 
an  initial  approximation.  When  ap'rlylns  the  Euler  itera¬ 
tion  formulae  it  is  cominon  practice  in  programming,  and 
indeed  desirable,  to  lead  into  the  iterations  with  a  good 
approximation  to  the  desired  root,  thus  minimizing  the 
number  of  time-consuming  iterations  required  for  full 
precision.  The  only  iterr^tion  formula  v/orthy  of  consid¬ 
eration  in  view  of  the  STL  lo^-ex'ronentlal  method  is  the 
Wev^ton  -  Raphson  formula,  Ioo»  'Tbls  is  a  second-order 
formula,  i.e.,  if  a  reasonably  close  approximation  is  ob¬ 
tained,  the  error  is  approximately  squared  with  each  suc¬ 
ceeding  iteration.  For  ex?mr)le,  if  we  use  a  Fade  approx¬ 
imation  to  an  error  1^1  ^  2"^^  (relative  error  =  ^ 

and  apply  one  Newton  -  Raphson  iteration  to  this  initial 
value,  the  result  will  be  within  the  error  bound 
2“^7,  The  commutation  time  will  be  3NA  +  2A4-1D  for  the 
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Fade  approximation,  plus  lA+(n-l)K+lD  for  the  Newton- 
Kaphaon  Iteration,  plus  (n-l)S+D(8)  to  reckon  the  expo¬ 
nent,  making  a  total  of  3MA +3A  +  (n-l)M  +  1D  + D(8)  +  (n-l)3. 

The  Pad^  approximation  and  truncated  series  mech¬ 
anizations  are  organizationally  similar  to  that  of  the 
STL  log  -  exponential  method,  and  are  given  In  Figures 
6-1  and  6-2,  The  micro  flow  charts  for  these  methods 
(mantissa  part)  are  given  In  Figures  6-3  and  6-4,  Both 
the  mechanization  and  micro  flow  charts  for  the  STL  log- 
exponential  method  are  given  In  the  report  by  Cantor, 
Estrln  and  Turn  [2], 

The  timing  evaluations  of  the  various  methods 
were  given  as  sums  of  multiples  of  the  basic  arithmetic 
and  logical  operations.  In  order  to  directly  compare 
one  method  with  another,  a  more  common  time  base  must  be 
specified.  One  way  of  doing  this  la  to  designate  one  of 
the  basic  operations  as  a  unit  time,  and  then  express 
the  remaining  operations  as  multiples  of  this  time  unit, 
giving  all  execution  times  In  terms  of  the  time  unit. 

As  an  example.  If  we  were  to  choose  the  IBM  7090 
operation  timings,  using  the  one  bit-position  shift  as 
our  time  unit,  we  would  obtain  the  ratios  given  In  Table 
6-2,  A  one  bit  position  shift  In  the  IBM  7090  takes 
1/12  of  a  2,18  microsecond  machine  cycle,  or  O.I83 
microseconds. 
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Figure  6-*1 1  Fade  Approximation;  Organization. 


Accumu* 

lator 


Figure  6*2:  Truncated  Series;  Organization* 


no 


Figure  6-3:  Micro  Plow  Chart  for  Fade  Approxioation, 
n  Fixed,  Maximum  Relative  Error 

1.5 ‘10”®. 


Ill 


Figure  6-A:  Micro  Flow  Chart  for  Truncated  Series 
Method »  Two  Multiplication  Scheme » 
n  Fixed. 
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Op. 

Kin. 

Avg. 

Max. 

s. 

1 

1 

1 

VA 

6 

6 

6 

A 

6 

6 

6 

M 

0 

86 

108 

Ms 

0 

29-43 

36-54 

D 

0  1 

108 

108 

Table  6-2:  Operation  Ratios  for  Fixed-Point 
Arlthnetlc  and  Logic  In  the  IBM 
7090  Arithmetic  Unit. 

In  the  above  table,  all  operations  are  fixed- 
point  binary,  with  a  full-word  length  of  27  bits.  Oper¬ 
and  fetch  and  operation  decoding  were  not  included  In  the 
above  timings.  The  short  multiplication,  ^5g,  may  be  any¬ 
where  from  1/3  to  1/2  the  length  of  a  full -word  multipli¬ 
cation,  depending  upon  the  method  In  v:hlch  It  la  used. 

If  the  maximum  operation  ratios  are  substituted  Into  the 
timing  expressions  In  Table  6-1,  the  value  of  n  at  which 
each  method  becomes  as  tlne-consumlng  as  the  STL  log- 
exponential  method  may  be  estimated.  The  key  to  Table 
6-3  Is:  (4)  No  crossover;  takes  longer  than  In-exp. 

(-);k  No  crossover  for  reasonable  slse  n;  Takes 
less  than  In-exp,  k*  approx,  fraction  of 
In-exp.  time. 

♦  Per  Iteration 
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Method 

Timing  Cross¬ 
over  with  STL 
log-exp. 
method. 

Binomial  Theorem: 

n-  2 

(-) 

I 

n>  2 

n>2 

Euler  Formulae:  t 

1 00 

n  >  4 

lio 

1 01 

l20 

Jll 

* 

1 02 

(♦: 

Fade  Approximation 

1 

o 

• 

Truncated  Series: 

1  mult. 

(- 

|r  0.5 

2  mult. 

)t  0.9 

Fade  -  one  Iqq 

n  >2 

Table  6-3:  Timing  Crossover  Points  for  the 
n^-h  Rooting  Methods . 


The  stored  table  requirements  of  those  methods 
which  require  stored  constants  are  summarized  In  Table 


6-4. 


rn 

Approx.  Table 
Size  for  small 
values  of  n 

Table  Size 
Crossover 
with  STL  In- 
exp.  method 

Method 

STL  In-exp. 

1037* 

■KHjH 

Truncated  Series:  1  mult. 

12, OC 0-16 .000 
288-316 

(n.--2,3,4,5)  2  mult. 

Fade  Aooroxlmatlon 
(n»2,3.4,5.6.7) 

354-1050 

Table  6-4:  Stored  Constant  table  Size  Crossover  Points 
for  the  n^h  Rooting  Methods, 

Key:  (4>)|k  no  crossover;  greater  than  In-oxp.  karaile  • 
(-);k  no  crossover;  less  than  In-exp.  ks  ratio. 

*  Independent  of  n. 
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Conclualona 


Of  all  the  rooting  methods  examined »  the  STL 
log-exponential  method  has  been  found  to  bo  the  moat 
veraatlle,  and  In  most  cases  the  fastest.  Traub  reports 
a  similar  conclusion  in  his  comparison  of  programmed  It¬ 
erative  methods  for  the  n^^  roots  ^12^  versus  use  of 
In  X  and  e^  subroutines. 

For  the  special  case  of  the  square  root  the  bino¬ 
mial  theorem  method  Is  desirable  from  both  the  timing 
and  mechanization  viewpoints.  In  fact,  the  square  root 
could  be  Incorporated  In  a  conventional  arithmetic  unit 
with  the  addition  of  some  logical  circuitry  because  of 
Its  close  relationship  to  the  division  operation.  The 
nonrestoring  square  rooting  method  has  been  found  to  have 
a  time  advantage  over  the  related  restoring  method,  as 
vms  borne  out  by  the  simulation. 

For  the  higher  roots,  the  Fade  approximation  and 
the  truncated  series  methods  are  fester  than  the  log- 
exponential  method.  Both  methods  require  tables  of 
stored  constants  corresponding  to  each  value  of  n,  the 
truncated  series  method  requiring  a  lesser  number  of  con¬ 
stants.  However,  the  truncated  series  method  encounters 
difficulties  when  the  operand  Is  near  the  Interval  end¬ 
point  2“*^  when  n  gets  large,  whereas  the  Fade  approxima¬ 
tion  has  no  such  difficulties,  and  thus  the  latter  is 
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preferable  when  n  la  larse. 

The  Euler  Iteration  formulae  are  entirely  too 
tine  consumlns  to  be  mechanized  because  of  the  superior¬ 
ity  of  other  available  methods.  Extensions  of  Kadler'a 
method  defy  reasonable  mechanization,  and  thus  are  not 
useful. 

It  Is  recommended  that  the  nonreatorlng  version  of 
the  binomial  theorem  method  be  used  for  the  square  root. 
For  hlisher  roots,  the  Fade  approximation  or  the  truncated 
series  methods  should  be  used  If  the  problem  In  question 
Is  sufficiently  specialized  to  require  a  large  number  of 
nth  roots  for  fixed  n.  Otherwise,  for  the  sake  of  maxi¬ 
mum  versatility  per  unit  equipment  expenditure.  It  la 
recommended  that  the  STL  log-exponential  method  be  used. 

Among  other  procedures  which  might  well  be  consid¬ 
ered  In  further  study  of  this  problem  are  those  making 
use  of  unconventional  number  representations. 


116 


BIBLIOGRAPHY 


1.  Berner,  R. ,  "A  Subroutine  Method  For  Calculating 
Logarithms",  Cocua.  ACM  1:  5-7,  Hay,  1958. 

2.  Cantor,  D. ,  Zstrin,  G.,  and  Turn,  R. ,  "Computation  of 

Elementary  Functions,  In  x  and  e^,  in  a  Variable 
Structiu:e  Computer",  University  of  California,  Loa 
Angeles,  Dept,  of  Engineering,  Tech.  Report  no.  61-16, 
1961. 

3.  Coveyou,  R.  R. ,  "Serial  Correlation  in  the  Generation 
of  Pseudo-Reindom  Humbers",  Jour.  ACM  7:  72-74, 

January,  I960. 

4.  Preiraan,  C.  V, ,  "Statistical  Aneilysis  of  Certain  Binary 
Division  Algorithms",  Proc.  IKE  49:  91-103,  January, 
1961. 

5.  Hald,  A. ,  Statistical  Theory  With  Engineering 
Applications ,  Wiley,  New  York,  1952,  Chapter  7. 

6.  International  Business  Machines  Corporation,  IBM 
Customer  Engineering  Manual  For  the  7090  Data 
Processing  System,  Vol.  1,  Form  223-6860-1 ,  1959. 

7.  I.'adler,  M. ,  "Division  By  the  Method  of  Radixea  in 
Computing  Machines",  Stroje  na  zpracovani  informaci 
(Praha)  4:  79-102,  1956  (in  Czech). 

8.  Nadler,  U.,  "Division  and  Square  Root  in  the  Quater- 
Imaginary  Number  System",  Comm.  ACM  4:  192-193, 

April,  1961. 

9.  Ralston,  A.,  and  V/llf,  H. ,  Ed.,  Mathematical  Methods 
For  Digital  Computers ,  Van  Ilostrand,  Mew  York,  I960, 
Chapter  1. 


117 


10.  Richards,  R.  K. ,  Arithmetic  Operations  in  Digital 
Coopvitera.  Van  Kostrand,  Kew  York,  1955,  pp.  279-282. 

11.  Rotenberg,  A.,  "A  Kew  Pseudo-Random  Number  Generator", 
Jour.  ACM  7:  75-77,  January,  I960. 

12.  Traub,  J.  F. ,  "Comparison  of  Iterative  Methods  For  the 
Calculation  of  Nth  Roots",  Comm.  ACM  4:  143-145, 

March,  1961. 

13*  Traub,  J.  F. ,  "On  a  Class  of  Iteration  Formulas  and 
Some  Historical  Notes",  Comm.  ACM  4:  276-278, 

Jiuie,  1961. 


118 


APPENDIX 


Programs  for  the  Property  Distribution 

MAIN;  Calls  the  Input  and  Initialization  routine,  gener¬ 
ates  the  pseudo-random  operands,  and  takes  their 
square  root  one  at  a  time,  calls  the  subtotaling 
and  output  routines  every  1024  operands*  Flow  dia¬ 
gram  given  in  Fig,  A-1, 

INPUT:  Essential  duty  Is  to  set  to  zero  all  the  data 
areas  before  performing  the  experiment, 

RT(2):  Binary  square  root  simulation  program.  Contains 
counters  that  count  up  number  of  Iterations,  normal¬ 
izing  shifts,  and  corrections  for  each  operand. 

Flow  diagram  given  In  Fig,  3-6  . 

FFXSRT:  Identifies  the  range  of  each  operand  by  compar¬ 
ing  it  against  a  table  (PFXTBL),  placing  an  address 
modifier  In  Index  register  1  so  that  the  results  may 
be  determined  versus  operand  magnitude. 

SU3T0T:  Takes  the  tally  of  the  fixed-point  counters, 
converts  them  to  floating  point,  and  computes  the 
output  information. 

RBIT  B  root  bits  per  iteration; 

P3HFT  =  shifts  per  operand; 

PXITER  =  Iterations  per  operand; 

PCORR  =  corrections  per  operand; 

PFREQ  -s  relative  frequency  of  operands. 

OUTPUT:  Contains  the  output  formats.  Prints  out  the 
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quantities  computed  by  SU3T0T  every  1024  operands 
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ADD  INTI 

STD  J6R0UP 


call  TATTLE»J6R0UP  visual  display  of  group  counter 
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UJ  o  c 

QC 

1-  Z  Q 

QC  UJ 

(O  X  UJ  < 

Ui  3 

h- 

u  o  ^ 

►-  to 

O  UI  3 

to  z 

►- 

CD 

UJ  z  <  QC 

«  < 

lO 

QC  U  ^  Ui 

o 

ui 

O 

►- 

UJ  QC  »- 

to  1- 

z 

10 

u. 

h-  QC  QC  lO 

QC  O  O 

QC 

z 

Ul 

QC  Ui  Ui  *"1 

U.  C 

*H  QC 

o  a 

o 

o  o 

to 

c  :> »"  o 

UI  QC 

UI 

uO  h- 

o 

< 

u  *- 

QC  QC 

X  lO  to  UI 

QC  1- 

QC  to  1^ 

a  o;  < 

H- 

UJ 

<  < 

QC 

O  Ui 

to  Z  —  QC 

<  O  UI 

UI  QC  Z 

O  UJ  u 

< 

-i 

O 

o 

<  o 

Z)  O  QC 

to  UI  3 

UJ  O  •-• 

o 

<  z  UJ  UJ  a 

O  QC  < 

UI  ^  O 

z  <  z  o 

QC 

1-  Q 

z 

U  (-i 

^  QC  QC  o  lO  iO  D 

H  to  u 

^  u  -•  z 

o 

O 

z 

o 

^  < 

<0  HUI  QOQCZ*^l 

K  ^  ^ 

z 

u. 

QC  >- 

a  T 

Z  U  QC  VJ 

Z  QC  U  lO 

UJ  o  00 

D  Q  Z 

z 

QC 

Z  UJ 

UI  < 

UI  Nl 

W  UI  3 

O  Z  UJ  o 

h“ 

UJ  u 

O 

^  QC 

o  c  o  u 

-1  O  Q 

QC  Z  QC  to 

z  Qc  —  a  »- 

a 

lO 

QC  QC 

u. 

I-  z  z 

QC  J  UJ 

<  < 

o: 

D 

UJ 

o 

a  a 

<  ^  -J  I-  O  <  QC 

Z  a  K  K 

D  »-  UJ  UJ  QC 

K 

»  f- 

1“ 

UJ  UJ 

^  QC  <  < 

^  1  Z  < 

H  UJ  U. 

K  to  >  >  r 

H- 

O 

^  to 

lO  •-* 

o  >  > 

lO  UI  z  ^ 

UJ  UJ  <  <  UJ 

UJ 

z 

UJ 

UJ 

<  < 

<  a  UJ  QC 

^  o  o  o 

Z  O  Z  I 

QC  H  to  to  QC 

to 

O  11 

II  QC 

h-  11 

«  to  lO 

zOQci-c:^^ztouju.i*^vo 

< 

X 

> 

►-  >  tO 

^  4  U.  to  X 

<f  »  UJ  UJ 

«  M  J  00  (T 


h-  O 

o 

>  h*  O 

lO  >  1^  O 

X(/>4’4'  0«H 

•  ++  UJH-**  r-  O* 
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>  UJ  »*  H 

>t/)  OC  Z  »-0^r-l>U. 

«/)X  a^oOtocDHtoi 

¥->  H>uji/)auju.-jt0OH»Qzac(/> 

(fi  rZ)ttXCQrKOXQCaZ^K«« 

a  UJ  ^ 
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u 

CL 

s 


UJ 

z 

a  H- 

O 

z 

UJ  z 

u. 

UJ  «/)  z 

h-  3 

h-  a  o 

z  O 

• 

Z  to  UI  *-i 

3  U 

o 

3  Z  H-  *- 

C  1 

z 

o  UI  Z  3 

O  s 
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1  3 

X 

*/>  Z  O  ►-•  O 

QD  to 

o 

Z  Ui  Z  3  U  Z  Z 

D 

3 

UI  z  UI  O  ^  < 

to  Z 

z 

1-  3  >-  U  z  *0  Z 

o 

< 

Z  Z  -i  z  O  »-•  UI 

Q  O  <  O  O  »-  O 

<  O  >-  QC  Z  U. 

M  q:WK<0<0 
►-I-OOU-QCH-QC 

_  zoaxi^“tjo-0 

mQC  MOCLUOt/i^OCOZ 
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«  < 

h~  QC 

<  o 

a  H- 
UJ  tO 


I 

£ 


§ 
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I 

M 

I 


« 

t. 

I 


2 
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S 


i/)  i/) 

a  CO 
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X  X  ?:  X  z  X  jx  X  a 
oooooooooz 
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CC  Ct 

ui  oe 

H-  O 

M  SJ 

* 


x 

o 

a 

z 

< 

z 
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z  UJ  Z  UJ  O  K  3 

tC  X  «A  ^  U  Z 
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PREFIX  IDENTIFYING  ROUTINE 


a 

i/> 

K 

u. 

a 


z 

to 

oc 

c  a; 

o 

Ui  o 

>— 

KJ 

to 

< 

<  < 

oc 

KJ 

^  KJ 

o 

*••4 

Q.  t/> 

>- 

a 

z  n 

< 

z 

i/>  o  z 

>- 

u 

M  ^ 

iO 

< 

LU 

o 

o  o 

o  o  o  o 

o 

o 

o  n  o 

o  o  o  o 

UI 

z  u  O 

I- 

z 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

lO 

UJ 

•-• 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

z 

Q 

o 

o 

o 

o  o  o  o 

o  o 

o  o  c 

o  o  o  o 

UI 

Ll  Z  O 

z 

lU 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

to 

«  z 

1.^ 

z  z 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o  < 

I 

O  QC 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o 

O  LU  Z 

K  D 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

UI 

Z  >  UJ 

►- 

to  H 

o 

o 

O  O  OiO 

o 

o 

o  o  o 

o  o  o  o 

> 

<  a 

< 

UJ  UJ 

o 

^  O  '4*  O 

-4 

o 

4*  o  4- 

O  4  O  4 

< 

to  iO  o 

z 

a  z 

r- 

^  \0  its 

-4 

(A  fO  rg 

rg  r-*  ^  o 

to 

to 

rH 

eH  ^  ^ 

I-* 

tu 

z 

LU  O 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o  o 

o 

-1  o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o  o 

o 

CQ 

O 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o  o 

< 

<  o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o  o 

rH 

»-  o 

o 

o  o  o  o 

o 

o 

o  o  o 

c  o  o  o 

o  o 

h- 

♦ 

t-4 

O 

o 

o  o  o  o  o 

o 

o  o  o  o  o  o  o 

o  o 

u 

» 

K  O 

o 

o  o  o  o 

o 

o 

o  o  c 

coco 

o  o 

LU 

CD  ^ 

^-e 

o 

o 

o  o  o  o 

o 

o 

o  o  o 

o  o  o  o 

o  o 

a  to 

rH 

>- 

lO 

u. 

o 

o 

o  o  o  o  o 

o 

o  o  o  o  o  o  o 

o  o 

z  o 

• 

K 

eM 

a  ^ 

Ui  vC  rvj 

rg  ^  rsi 

^  eg 

«Oog«o<M«£rg^040 

o  z  • 

(>i  u. 

1 

z  » 

oc 

r- 

^  lA 

<4 

-4 

fo  rg  rg  r-»  •-*  o 

o  o 

u  « *-< 

fO 

a 

♦ 

rsj 

a  r-l 

r-l 

f-4 

r-4 

•-(  rH 

rH  ^  eH 

•H  1-4 

rg 

>- 

oc  ♦ 

ZOiiJKOXZ*-Oau.UUUUV^UUUUUUUUOUOU(/)Z 

uJzai/)^<o»-^Kaooooooooooooooooocouj 


K 

tt 

(D  to 

to 

»-  o 

K 

X  z 

u 

u.  i-i 

a 

a 
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Table  A-3»  Prefix  Identifying  Routine,  0.25  ^  x  <  0.5 


a  oc 

UJ  o 

O  H* 

40 

<  < 

oc 

^  o 

o 

a  i/>  •--> 

K 

a  o 

< 

40  o  z 

u 

M  h-  M 

40 

»«e 

< 

Ul 

o 

a  u  O 

z 

UJ 

o 

u  z  o 

z 

Ul 

-1  ^  z 

z 

o  < 

z 

O  a: 

UJ 

O  UJ  GC 

U 

►-  o 

z 

Z  >  Ut 

iO  H- 

<  a 

< 

UJ  UJ 

K 

lO  m  o 

X 

oc  a 
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lO 

O 

Ul 

cc 

z 

o 

o 

o 

fH 

z 

< 

ft 

M 

Fi4 

>- 

♦ 

U  h- 

u 

•J 

ft 

—  QC 

Ul 

(O 

Fg 

m 

K  1/)  ^ 

►- 

•  40 

Z  K 

K  O  4  • 

K 

*^0  4 

UJ  u 

O  z  »  eg 

U 

1  Z  • 

o  a 

o  in  u 

*  eg 

o 
.  o 
o 
o 
o 
o 
o 
o 
o 
o 
r*- 
<*> 

UJ  o 
-I  o 
ro  o 

<  o 

f-  o 
o 

K  O 
•-I  O 
U.  O 
UJ  ^ 


o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  in 
cn  m 
•  *  » 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  o  o 
4  4“ 

^  in 


o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
m  nj 

•  » 

o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 

n  ni 


o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
^  o 
cn 

*  » 

o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 

i-»  o 


o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
^  in 

CM  (M  CM  OJ 

ft  ft  •  » 

o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  o 
o  o  o  c 
o  o  o  o 
o  o  o  o 
^ 

N  ^  in  ^ 
rvi  rg  r>j  rg 


o  o  o 
o  o  o 
o  o  o 
o  o  o 
o  oo 
o  oo 
o  oo 
o  o  o 
o  oo 
o  o  o 

tn  rg 
^g  rg  CM 

ft  ft  ft 

o  o  o 
o  oo 
o  o  o 
o  oo 
o  oo 
o  oo 
o  o  o 
o  oo 
o  o  o 

cn  nj  ^ 
ni  rg  eg 


I/) 

QC 

o 

►- 

< 

yj 


UJ 

yn 

z 

UJ 

m 

o 

UJ 

> 

< 

%n 


o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
o  o 
^  o 
o  o 
nj  rj  f-g 


K 

U  QC  * 

azoujKOKZft-oaujuuuuuuouvjuuuuuuuu^z 


aujZK40u<b»- 

^WZOOOOOOOOOOCOOOOOOOSUI 

Z 

CD  40 

40 

1-  o 

K 

K  Z 

U 

u 

•  a 

a 
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Table  A-4:  Prefix  Identifying  Routine,  0*5  S  x  < 1 


IM 

o 

LJ 

QC 

U. 


fM 

CO 


^  DC 

UJ 

o  a 

z 

sC  o 

r4 

w  9k 

D 

CM 

O 

>“  rg 

o 

z 

X  PO 

UJ  X 

Z 

QC  D 

z 

3  QC 

U  Z 

O 

O  UJ 

•  X 

•  ►“ 

QC  • 

M 

QC  Z 

< 

O  X 

O  • 

IM 

(O  » 

U  CO 

pH 

•  • 

-J 

w  rg 

00  QC  vn 

< 

rH  <0 

X  UJ 

>-  ^ 

D  K-  O  Ui 

a 

K 

X  ►“ 

z  •  a 

z 

M 

X  u. 

X 

>-«  X  rg  X  o  < 

z 

U  X 

c 

•  >“  *  ^  ^- 

CM 

o 

M 

a  to 

a 

X  X  X  »-  U- 

O 

• 

o  o 

UJ 

z 

X  D  X  U  •  >- 

•  o 

• 

• 

z 

o 

z  z 

< 

DZDIcmOOp^ 

o 

N 

o  o 

M 

z 

o  o 

QC 

z 

O  X  Q  to  ►-*  Q. 

• 

u 

H 

M 

■ 

< 

-  Z  O 

•H 

D 

VO  to 

z 

z 

z  z  z  z  ^ 

n 

«« 

O 

>- 

z  z 

c  o  o  o  o  o  < 

CO 

o 

QC 

W 

z 

D 

Ul  UJ 

X  X 

X  X  X  X  X  O  X 

pH 

h- 

tu 

QC 

o 

CD 

a 

X  X 

X 

X 

X  X  X  X  a  < 

z> 

u 

QC 

UJ 

D 

z 

»-«  M 

COCOOOOUIZOZ 

O 

oc 

to 

o  o 

UUVJUUVJUQC 

X 

O  VO 

X 

yj 

u 

UJ 

z  z 
ct 
D 

Z  »-  O 
O  Uj  Z 
O  a  UJ 


3 

»4 

«> 

P 


4* 

« 

P« 

o 


'  « 

c 


p 

o 

K 


%• 

«r\ 

I 

< 


A 

& 


yj\j  u 


CO 
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rsi 

tr\ 

OQ 

o 

X 

lU 

D 

QC 

Z 

U. 

K 

o. 

ft 

ft  A 

Z 

—  fM 

D 

INi  CO 

Z 

iM  m  — 

K 

O 

ft 

w  Qc  rvi  uj 

O 

O  oc  QC 

Ul 

UJ  O  u. 

cr 

a  u  o  >< 

u. 

u.  a  uj  * 

I-,  ft  a 

ft 

•  ^  U.  fVI 

QC 

rg  *  (n 

a 

rg  ^ 

o 

(<>  w  rg  ct 

u  o 

—  oc  cn  QC 

^  LU 

q:  UJ  O 

ft  a 

oc  h-  a  u 

cr  u. 

O  •-  QC  K 

uj  a 

cr 

KJ  X  O 

H-  • 

Ul 

^  Cl  KJ  ^ 

^  QC 

1- 

CM 

fM  CM 

U-  u 

o 

fO  fO  QC 

X  a 

UJ 

Lm 

W 

w  w  UJ 

lO  • 

QC 

QC 

^  oc 

u. 

oc 

ftn* 

W 

UJ 

U.  U- 

ft  UJ 

ft 

LU 

w 

•>4 
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H- 

XXX 

X  *- 

QC 

w 

t- 

W 

Q^ 

•i^ 

UJ 

GC 

to  O  X 

u  « 

QC 

H- 

QC 

O 

UJ 

O 

QC 

O 

X 

ft 

Q.  ft  ft 

UJ  X  o 

u. 

X 

a 

LU 

1- 

LU 

U. 

UJ 

D 

UJ  ^ 

ft  <ii»  M 

X  a 

yj 

X 

X 

o 

QC 

a 

V 

QC 

Z 

z  eg 

CM  CM 

VJ  • 

ft 

-f 

U. 

X 

u. 

u. 

X 

CM  m  cn 

l-l  h- 

QC 

X 

X 

X 

♦ 

N. 

X* 

o  1 - 

m  «->  — 

»  u. 

UJ 

X 

•f 

+ 

-f 

-*• 

D  K 

«-*  QC  >- 

QC  X 

h- 

D 

W 

W 

QC 

M 

CD  O  U. 

H-  UJ  UL 

O  to 

z 

h- 

ft>a 

Z  QC 

QC 

o 

Ul 

•- 

D  OC  I 

«  ►-  X 

X  QC  a 

X 

X 

u. 

w 

UJ 

UJ 

QC 

w 

LU 

H- 

►- 

►- 

QC 

O 

to 

lO 

CC  ^  to  O  QC  • 

ft 

♦ 

fM 

X 

K 

►- 

O  QC 

QC  O 

U. 

U. 

M 

OC 

UJ 

O  i-« 

QC  K  X  O  UJ  h- 

►-  CD 

iO  u. 

U  QC 

Ul 

UJ 

X 

I 

X 

o 

oc 

LU 

z 

2  .-1 

u. 

X 

ft 

X 

N 

X 

C 

QC 

U) 

40 

N 

u 

Ul 

Z 

«  2 

z  z  z 

<  »  03 

X 

D  ^ 

M 

to 

M 

u 

O 

II 

U. 

w 

H 

N 

U 

-JOOOOOCZOC 

10  z 

H 

N 

—• 

«•» 

II 

II 

N 

UJ 

»- 

< 

M  M  »i>4 

X 

M 

w 

W 

3 

K  to 

lO  iO  lO  z  z  z 

z 

N 

w 

QC 

M 

w 

M 

QC 

w 

Z 

z 

o  c  z 

ZZZOOOO-DO 

»- 

UJ 

QC 

QC 

%l^ 

O 

W 

LU 

oc 

O 

oc 

QC 

Uj 

UJ  UJ  UJ 

XXX 

X 

•r 

u. 

»- 

H>  UJ 

QC 

QC  UJ 

o 

h- 

U. 

oc 

UJ 

»-■ 

3 

cc 

(C  X 

X  X  ^ 

XXX 

X 

x> 

X 

u. 

•M 

h- 

O 

QC 

QC 

UJ 

M 

X 

o 

oc 

z 

H  o 

D 

D  ^ 

»ii« 

c  o  o  c  r 

c 

lO 

X 

X 

W 

C  Ll 

Of 

QQ 

lO 

X 

u 

u 

O  UJ  z 

to 

to  a 

O  O  O  O  LJ  U  U 

X  o 

X 

iO 

X 

K 

X 

u 

X 

u. 

QC 

a 

Q. 

a 

a 

QC  UJ 

u 
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Ttiblm  A-6t  Subtotalin^  Routinet  Property  Distribution^ 


SUBROUTINE  OUTPUT 


o 

•-» 

o 


u. 

O  r-i  o 

04 

o  O 

Q 

rH  r-C 

w 

rsi  ui 

o 

i/> 

UJ 

M  t/) 

oc 

lU 

Uu 

z  u 

o  o  o  o 

a 

o 

o  o  o  o 

» 

•  oc 

O  r-c  o 

z  a 

o  O 

rsi 

o 

r-l  ^  r-< 

cn 

«  >- 

40 

w 

QC 

<  -j 

QC 

o  o  o  o 

a. 

c 

D  U. 

o  o  o  o 

3 

Z  i/) 

O'-O'-O  — 

o 

CL 

i/) 

O  »™4  O 

-*  o: 

(/)  UJ 

rg 

rg 

(<>  o 

u 

f-H  rH  • 

• 

• 

(M 

►-  u 

O' 

O'  4/) 

O  D 

o  o  u. 

u. 

u. 

O  O  t/) 

f-4  rH  «-l  GO 

oo 

00  X 

Ot 

Ui  QC 

UJ 

QC  iA 

K- 

U-  >-  O 

«-• 

^  z 

0.  q:  Z 

QKOXOXOXO 

DC  •"« 

K 

•  <  < 

5?  ^  ^ 

UJ 

CL 

q:  z  o: 

^U.»^U.^Ur-*Cti— «<-*■  QC 
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Table  A-7s  Output  Routine,  Property  Distribution 


Fro.'^rp.rns  for  the  Timing^  Distribution 

:  Calls  the  Input  routine,  ser.erates  the  pseudo¬ 
random  operands  takes  their  square  root  one  at  a 
time,  fills  In  the  tlmlns  distribution,  and  calls 
the  output  routine  at  the  end  of  the  experiment. 

Flow  diagram  given  In  Pig,  A-2, 

INPUT2 :  Reads  In  the  number  of  operands  to  be  processed, 
RT(2) :  Binary  square  root  simulation  program.  Uses 

Index  register  2  to  count  up  number  of  time  units 
required  to  execute  each  square  root.  Flow  diagram 
given  In  Fig,  3-6A, 

0b’TFT2:  The  timing  distribution,  IQ  or  JQ,  la  the  tim¬ 
ing  density  function.  The  normalized  cumulative 
distribution  function  Is  computed  and  placed  In 
XQ,  All  nonzero  entries  of  JQ  are  printed  out,  and 
all  entries  of  XQ  are  printed  out. 
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CALL  RT(2).X  EXTRACT  SQUARE  ROOT 

cla  io+i+(c»2  augment  timing  distribution 
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